Slashdot Mirror


Intel Core 2 'Penryn' and Linux

An anonymous reader writes "Linux Hardware has posted a look at the new Intel "Penryn" processor and how the new processor will work with Linux. Intel recently released the new "Penryn" Core 2 processor with many new features. So what are these features and how will they equate into benefits to Linux users? The article covers all the high points of the new "Penryn" core and talks to a couple Linux projects about end-user performance of the chip."

99 comments

  1. Perspective by explosivejared · · Score: 5, Insightful

    "There are some new instructions that could be more convenient to use in some special cases (like the new pmin/pmax instructions). But these will have no real performance benefit."

    "So we do not plan on adding SSE4 optimizations. We may use SSE4 instructions in the future for convenience once SSE4 has become really widely supported. But I personally don't see that anytime soon..."

    I think that puts the hype over penryn into perspective. There are some nice improvements energy leaks and such, but it's nothing revolutionary.

    --
    I got a catholic block.
    1. Re:Perspective by SpeedyDX · · Score: 5, Interesting

      Isn't that their strategy when they use a finer fab process anyway? I remember reading an article (possibly linked from a previous /. submission) about how they had a 2-step development process. When they switch to a finer fab process, they only have incremental, conservative upgrades. Then with the 2nd step, they use the same fab process, but introduce more aggressive instruction sets/upgrades/etc.

      I couldn't find the article with a quick Google, but I'm sure someone will dig it up.

    2. Re:Perspective by jd · · Score: 1, Informative
      The greatest gains are to be made where there are the greatest latencies and least bandwidth. The CPU has not been a significant bottleneck for some time. PCI Express 2.x (which works at 5 GT/s and supports multiple roots) and HyperTransport 3 (which works at 20.8 GB/s) are obvious candidates for improving performance over the busses usually used in computers.

      RAM is another area that needs work. I mean, RAM speeds are getting very slow and caches aren't big enough to avoid being saturated by modern software. The result is that there is a lot of inefficiency in extracting things from RAM. It's better than it would be with no cache at all, but it's nowhere near what it could be.

      Hard drives could also be improved. If you had intelligent drives, you could place the filesystem layer in an uploadable module and have that entirely offloaded to the drive. Just have the data DMAed directly to and from the drive, rather than shifted around all over the place, reformatted a dozen times and then DMAed down.

      I'm sure there are a million other adjustments that could be made that would be as good or better than these, but I don't see many of those being in the CPU itself. SSE4, perhaps, but it's not clear to me that a maths core in the CPU is any longer of any real value. With multiple roots to the bus, you can have a maths core in a different root with equal access to all resources and you would not need the main CPU to govern it or operations within it. The same could be true of graphics, of course. If you did that, the MPU and GPU could be truly independent compute nodes and therefore could be doing their own stuff with less CPU intervention. You'd then use barrier operations when synchronization was required.

      All in all, Intel's latest offering doesn't impress me, given everything else that is transpiring in the computing world.

      --
      It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
    3. Re:Perspective by DaveWick79 · · Score: 2, Informative

      I don't have a reference for this either, but this is the message that Intel regularly conveys to the channel. You can see the usage of this release strategy starting with the later Pentium 4 CPUs and it has continued through the various renditions of the Core series processors.

    4. Re:Perspective by wik · · Score: 4, Informative

      The name you are thinking of is the "tick-tock model."

      --
      / \
      \ / ASCII ribbon campaign for peace
      x
      / \
    5. Re:Perspective by ILuvRamen · · Score: 1

      what if you can get it with the penguin laser etched onto the top heatsink plate? And don't say "then the thermal grease wouldn't work very well" cuz that's the boring answer lol.

      --
      Google's Super Secret Search Algorithm: SELECT @search_results FROM internet WHERE @search_results = 'good'
    6. Re:Perspective by recoiledsnake · · Score: 1

      Hard drives could also be improved. If you had intelligent drives, you could place the filesystem layer in an uploadable module and have that entirely offloaded to the drive. Just have the data DMAed directly to and from the drive, rather than shifted around all over the place, reformatted a dozen times and then DMAed down Uhh what? Just a couple of lines above you said CPUs were overpowered, and now you want the filesystem code to run on the hard drive ? Specialized hardware maybe faster but which filesystem are you gonna have running on your hard disk? NTFS? ext3? ZFS? Reiserfs?
      --
      This space for rent.
    7. Re:Perspective by asm2750 · · Score: 2, Informative

      Like an earlier post said, its called the "Tick-Tock" strategy. One upgrade you improve architecture, and then the next upgrade you make the fab process smaller. Its not a bad idea, but two questions to ask is this: Could Intel hit a dead end because 16nm is the last point in the ITRS roadmap nulling this strategy in around 2013? Because once you go even smaller, you are essentially start having gates the size of atoms. And second, once quad core becomes more common will there really be any reason for consumer level products to go beyond that, given the fact very few programs take advantage of 64bit and processor parallelism?

    8. Re:Perspective by Wavicle · · Score: 1

      There are some nice improvements energy leaks and such, but it's nothing revolutionary.

      That's true for sufficiently brain-dead definitions of "revolutionary." Hafnium based High-K transistors are revolutionary. Instruction throughput isn't everything. Manufacturing technology needs breakthroughs too. Or did you see no point in the continuous shrinkage from 100 microns down to where we are now?

      --
      Education is a better safeguard of liberty than a standing army.
      Edward Everett (1794 - 1865)
    9. Re:Perspective by Anonymous Coward · · Score: 1, Informative

      From your ramblings, I can't tell whether you don't know what your talking about or know just enough to make very confusing remarks.

      Its been a long time since raw computational power has driven CPU development. Almost all improvements are about further hiding latency. Your comments about ram, disk, etc. are all about I/O latency. This has been and will continue to get worse. Its been getting worse for decades and the majority of what CPU designers think about is how to deal with that fact.

      From your comments, I'm pretty sure the various adjustments you propose simply don't make sense. A lot of times you're just throwing out useless statements like performance being "nowhere near what it could be".. in what, a science fiction novel? You seem to have a vague notion of a distributed processing control, rather than centralized processing, but no direction on how these could effectively be coordinated more than they currently are (offload task to specialized unit, but manage overall user flow).

      The more I reread your post, the more I don't mind that Intel's offerings don't impress you.

    10. Re:Perspective by DaveWick79 · · Score: 3, Insightful

      I believe that by the time Quad core becomes mainstream, i.e. every piece of junk computer at Buy More has them, that 64 bit apps will also be the mainstream. By 2010 every computer sold will come with a 64bit OS, that will emulate for 32 bit programs but all the new software being developed will be transitioning to 64 bit.
      Can CPU performance hit a threshold? Sure it can. But maybe by then they will be integrating specialty processors for video encoding/decoding, data encryption, or for file system/flash write optimization, onto the CPU die. At some point nothing more will be required for corporate america to run word processors and spreadsheets, and tech spending and development will shift to smaller, virtual reality type applications rather than the traditional desktop. I think we have already reached the point where the desktop computer fulfills the needs of the typical office worker. The focus shifts to management & security over raw performance.

    11. Re:Perspective by poetmatt · · Score: 0, Troll

      I felt the same. Intel is trying to push, this seems a definite hype, plus we all know breakthrough generations of new processors are unstable/crap (first dual cores/socket...what, what was that, 975?). I look forward to AMD's next gen of desktop processors, if that happens soon (note, what is next from AMD as far as desktop? my memory fails on that).

    12. Re:Perspective by hedwards · · Score: 1

      That isn't at all surprising, I remember AMD's 3dnow, and cyrix's extensions, and how they were supposed to revolutionize things. In the end, neither did very well, and didn't ever actually live up to the hype.

      I remember when Unreal was released, it had software rendering via 3dnow, and it was far from satisfactory, and not just in resolution, turning that down still led to problems.

    13. Re:Perspective by pat+mcguire · · Score: 1, Interesting

      Yeah, at a certain point you run up against the uncertaintly principle, but I don't think that is supposed to be anything near an issue until the later half of this century. The first limit that processors are going to hit is frequency - an electron can only move three feet during a cycle on a three gigahertz processor. While that's plenty, there are going to be problems to be resolved whenever data can't be transferred from memory in the space of a single cycle. This is unrelated to the relative speed of the memory, as I'm talking about the speed of the wires rather than the latency of various components.

    14. Re:Perspective by Dzonatas · · Score: 1

      The article only focused on video decode/encode speeds, but that is not where most of the SSE4 instruction help in speed. The newer SSE4 instructions help more in vectorization to parallelization. If the encoder/decoder does not run in parallel, then most of the new SSE4 instructions won't help. If you look at Intel's TBB, you'll see exactly where most of the newer SSE4 instructions can be used!

    15. Re:Perspective by Ours · · Score: 2, Interesting

      To support what you say, Microsoft said that Vista and Windows 2008 Server where supposed to be the last OS to be available in 32-bit versions.

      --
      "You superiour intellect is no match for our puny weapons" - The Simpsons
    16. Re:Perspective by rbanffy · · Score: 1

      'cause, you know, four cores should be enough for everybody. ;-)

    17. Re:Perspective by Anonymous Coward · · Score: 0

      why would a post like this get a troll post? seems pretty suspicious to me

    18. Re:Perspective by slocan · · Score: 1

      Now let's play safe, and work with a theoretical 640 core limit.

    19. Re:Perspective by Anonymous Coward · · Score: 0

      Dammit, It's mod day for me, not meta-mod day. Who T.F. rated parent "Troll"??

    20. Re:Perspective by Anonymous Coward · · Score: 0

      Did your memory fail, or was the latency too high?

    21. Re:Perspective by InsaneProcessor · · Score: 1

      "Microsoft said that Vista and Windows 2008 Server where supposed to be the last OS to be available in 32-bit versions."
      This may be true as M$ may not ever release a new OS after the massive failure of Vista. The only way it sells is to force it down the throats of those buying new computer who don't have a clue. Even many of those buyers are buying XP to replace that piece of crap.

      --

      Athiesm is a religion like not collecting stamps is a hobby.
    22. Re:Perspective by darkwhite · · Score: 1

      I think that puts the hype over penryn into perspective. There are some nice improvements energy leaks and such, but it's nothing revolutionary. Improvements in fabrication technology have nothing to do with improvements in the ISA, beyond the extent to which the ISA relies on the performance provided by the process. The process improvements in Penryn are revolutionary. 45nm on hafnium gates with a whole slew of other process changes needed to make that work is something that five years ago wasn't even believed possible - I recall gloom-and-doom predictions that the brick wall was at 65 nm.

      Practically, Penryn may be an incremental step, but the process behind it will give Intel huge room for improvement in the next couple of years.
      --

      [an error occurred while processing this directive]
    23. Re:Perspective by Emetophobe · · Score: 1

      Take a look at this old image of the Intel roadmap.

      Also, Intel has a tech page where they describe this 2 year cycle.

    24. Re:Perspective by vasqzr · · Score: 1

      The 3DNow! miniGL drivers for graphics cards would give the K6 chips a 25-30% boost in FPS, putting them right with the Pentium II at the time

    25. Re:Perspective by Cyno · · Score: 1

      I am impressed with Intel's 45nm Core 2 shink improvements, already, at 5-10% boost per clock. And this is just their first Penryns. Intel has really turned things around, AMD has some catching up to do.. 2008 will be interesting.

    26. Re:Perspective by brunascle · · Score: 1

      actually, it's only Windows 2008 Server, not Vista. Bill Laing's "a server guy", and his announcement was about the server versions, but the media misapplied it to Vista. http://apcmag.com/6121/windows_server_gets_vista_version_itis

    27. Re:Perspective by Anonymous Coward · · Score: 0

      Isn't that their strategy when they use a finer fab process anyway? I remember reading an article (possibly linked from a previous /. submission) about how they had a 2-step development process. When they switch to a finer fab process, they only have incremental, conservative upgrades. Then with the 2nd step, they use the same fab process, but introduce more aggressive instruction sets/upgrades/etc.

      My wife does "tape-out" for Intel (processing the engineering models for exposure onto real silicon -- in other words, the final step before actual mask-making). It's a very cool job where she gets to write parallel software for a 100,000+ node cluster. She verified that this is Intel's process. It's obviously good engineering to minimize the number of variables when trying something new.

      Oh, the secrets she could tell. She gets to see EVERYTHING. This Penryn stuff is nothing compared to what's hiding in the secret, dark inner labyrinths of Intel. Too bad she has the scruples not to talk about it, even to me :-)

    28. Re:Perspective by paulatz · · Score: 1

      Yeah, at a certain point you run up against the uncertaintly principle, but I don't think that is supposed to be anything near an issue until the later half of this century.

      I think you have no idea what you are talking about. If you take Silicon as an example its crystal form has atoms separated by about 1 nm (nanometer), but if you add an impurity its effect spread on a radius of the order of 10 nm.

      So, you cannot go under 10 nanometer because separate circuits in the micro-(nano?)-processor start to interact between each other

      It happens much before reaching the atom limit.

      This is why processor producers are pushing for multi-core cpus; reversing Feynmann citation: they know that there is not much room left at the bottom.

      And, by the way, modern computers, used for desktop systems, are already doing nothing for 90% of the time; this is because the bandwidtg between RAM and cache is far too low, not to speak about hard disks speed.

      To summarize: the way to go is many cores, programmers able to use it (have you ever heard about OMP?), much faster memory access (or equivalently much large proc cache), and solid state disks.

      This will reuire dropping a lot of buzzwords, but you can create some new ones.

      --
      this post contain no useful information, no need to mod it down
    29. Re:Perspective by jwo7777777 · · Score: 1

      The point is that eventually it won't matter. Put the responsibility for all file stuff with the storage device. Just have a sufficiently standardized API and quick enough connection to it.

      The idea behind it is that specialized "application" processors in architected groups will comprise an "operating system" of sorts married with hardware that will be quicker than a generalized computer with a monolithic OS.

    30. Re:Perspective by PastaLover · · Score: 1

      Way too many people are buying Vista to call it "a massive failure". Anyway, most of the early adopters seem to be going for 64 bit anyway, so the GGP is probably correct.

  2. If video encoding/decoding is the bottleneck... by compumike · · Score: 4, Interesting

    In the article, the authors of XviD and FFMPEG, aren't too optimistic about speedups. If video encoding/decoding is the bottleneck, then why not start building motherboards with a dedicated chip specialized for this kind of work, instead of trying to cram extra instructions into an already bloated CISC CPU? Doesn't make sense to me.

    Also, an earlier comment that may be useful in this discussion: Why smaller feature sizes (45nm) mean faster clock times.

    --
    Educational microcontroller kits for the digital generation.

    1. Re:If video encoding/decoding is the bottleneck... by 644bd346996 · · Score: 5, Insightful

      The place for hardware decoders is on the graphics card. Hence the reason why Linux needs to use the CPU.

    2. Re:If video encoding/decoding is the bottleneck... by Vellmont · · Score: 5, Informative


        instead of trying to cram extra instructions

      Cram? Chip designers get more and more transistors to use every year. I don't believe there's any "cramming" involved.
      into an already bloated CISC CPU?
      You're about 15 years out of date. The x86 isn't exactly a CISC CPU, it's a complex instruction set that decodes into a simpler one internally. Only the intel engineers know how they added the SSE4 instructions, but based on the comments of the encode/decode guys, these new instructions sound a lot like the old instructions. It's not too hard to imagine that they didn't have to change much silicon around, and maybe got to re-use some old internal stuff and just interpret the new instructions differently.

      Anyway, so why not just have a dedicated piece of silicon for this exact purpose? Partly because it'd be more expensive (you'd have to basically implement a lot of the stuff already on CPU like cache, etc), but also because it's just too specific. How many people really care about encoding video? 5% of the market? Less?

      Hardware decoding on hardware is already a reality, and has been for some time. GPUs have implemented this feature for at least 10 years. But of course it's generally not a feature that has dedicated silicon, it's integrated into the GPU. If this is the first you've heard of it, it's not surprising. The other problem with non-CPU specific accelerations is they don't ever really become standard, as there's no standard instruction set for GPUs, and ever a GPU maker may just drop that feature in the next line of cards.

      In short, specialized means specialized. Specialized things don't tend to survive very well.

      --
      AccountKiller
    3. Re:If video encoding/decoding is the bottleneck... by CastrTroy · · Score: 1

      If you are really interested in encoding video, I would think that you would have a specialized chip. My TV Tuner has a specialized chip for encoding mpeg 2, which means it can encode 12 mbit/s mpeg2 without putting any noticeable load on my processor. I'm sure it wouldn't be too difficult to build a chip specifically to encode video into MPEG 4.

      --

      Anthropic principle: We see the universe the way it is because if it were different we would not be here to see it.
    4. Re:If video encoding/decoding is the bottleneck... by ozmanjusri · · Score: 1
      I'm sure it wouldn't be too difficult to build a chip specifically to encode video into MPEG 4.

      It's probably not, but as far as I'm aware, Thomson's Mustang ASIC is the only commonly available one.

      Most hardware video encoding is done with general-purpose DSPs and specialised software.

      --
      "I've got more toys than Teruhisa Kitahara."
    5. Re:If video encoding/decoding is the bottleneck... by jibjibjib · · Score: 2, Insightful
      How many people really care about encoding video? 5% of the market? Less?

      I don't know why you seem to think video encoding is some sort of niche technical application that no one uses. A huge number of people record video on digital cameras and want to email it or upload it without taking too long. Many people now use Skype and other VOIP software supporting real-time video communication. Many people rip DVDs. Many people (although not a huge number) have "media center" PCs which can record video from TV broadcasts.

    6. Re:If video encoding/decoding is the bottleneck... by xorbe · · Score: 2, Interesting

      > Cram? Chip designers get more and more transistors to use every year. I don't believe there's any "cramming" involved.

      Someone is definitely not a mainstream CPU designer! It never all fits... ask any floor-planner.

    7. Re:If video encoding/decoding is the bottleneck... by Anonymous Coward · · Score: 1, Interesting

      It's strange that XviD doesn't think SSE4 does much for video but Intel trots out DivX6 as the show pony for SSE4 optimization and speedup.

    8. Re:If video encoding/decoding is the bottleneck... by Anonymous Coward · · Score: 1, Insightful

      >How many people really care about encoding video? 5% of the market? Less?

      Anyone who wants to video conference and doesn't have tons of bandwidth.

    9. Re:If video encoding/decoding is the bottleneck... by pleappleappleap · · Score: 2, Funny

      Hardware decoding on hardware is already a reality, and has been for some time.

      As opposed to hardware decoding on software?

      Or redundant redundancies of redundancy?

    10. Re:If video encoding/decoding is the bottleneck... by samkass · · Score: 1

      One could argue that the place for graphics cards is on the CPU. What else are you going to do with all that extra silicon real estate?

      --
      E pluribus unum
    11. Re:If video encoding/decoding is the bottleneck... by 644bd346996 · · Score: 2, Interesting

      Some workloads benefit from vector processors, and some don't. For now, it is best economically to keep vector co-processors separate from CPUs, and use the advances in chip tech to lower power consumption and add more cores to the CPU.

      For example, many server workloads are handled best by a chip like Sun's UltraSparc T1, which doesn't have any floating point capabilities worth mentioning. People running that kind of server wouldn't buy a Xeon or Opteron that had a 600M-transistor vector processor. It's a huge waste of money. Similarly, people with low-end PCs would probably never use such an integrated vector processor fully, so competition would keep that kind of CPU out of that market.

      That leaves pretty much just the gaming and scientific computation markets. Of the two workloads, the former is occasionally CPU-bound rather than GPU bound, but most of the time, the vector processor is the biggest bottleneck by far for both workloads. In that case, it is much more economical if you can upgrade the vector processor without throwing away a perfectly good CPU.

    12. Re:If video encoding/decoding is the bottleneck... by Anonymous Coward · · Score: 0

      The place for hardware decoders is on the graphics card. Hence the reason why Linux needs to use the CPU.

      For output/playing, yes, but for transcoding or editing, no. I think seperate harware chip with large cache and close to memory is the ideal.

    13. Re:If video encoding/decoding is the bottleneck... by samkass · · Score: 1

      In that case, it is much more economical if you can upgrade the vector processor without throwing away a perfectly good CPU.

      Is it? How much does that slot, bus, southbridge, etc., cost? CPUs are cheap! Certainly cheaper than most graphics cards. And the proximity to L1/L2 cache and computational units might make for some interesting synergy.

      --
      E pluribus unum
    14. Re:If video encoding/decoding is the bottleneck... by sumdumass · · Score: 1

      Every two years or so, when you goto upgrade, there is a new socket design or some limitation with the existing north bridge/south bridge chipsets that require you to buy a new main board anyways. So some of the components you listed might be spent already.

      With PCI express and the bandwidth it can handle, it might be the best option to put it either on a separate daughter card or allow a separate video card to be installed and dedicated specifically to this. Either way, processor, daughter cards, or video cards, special drivers would need to be implemented in order to take advantage of it properly. Having instructions on the CPU that the OS doesn't know how to use would be somewhat troublesome in seeing an advantage. But with windows Vista, it is likely to be just as hard to implement a driver in either configuration seeing how Vista took a lot of the direct access away. I think adding a separate card that you can change co processors out with similar to the older 3d accelerators would be the best bet.

    15. Re:If video encoding/decoding is the bottleneck... by pclminion · · Score: 1

      The place for hardware decoders is on the graphics card. Hence the reason why Linux needs to use the CPU.

      Why? If you're going to be displaying the video on screen, then yeah, it makes sense to have it on the graphics card. But why can't we just have a general-purpose codec card? What if I don't want to display video, I just want to encode/decode it? Surely this is such a fundamental need that it deserves its own chip. If they can fit an encoder into a 1-pound handheld digital camcorder, why can't they put one on every motherboard?

    16. Re:If video encoding/decoding is the bottleneck... by Anonymous Coward · · Score: 0

      You're about 15 years out of date. The x86 isn't exactly a CISC CPU, it's a complex instruction set that decodes into a simpler one internally.

      Wikipedia: "CISC is a microprocessor instruction set architecture (ISA) in which each instruction can execute several low-level operations".

      It sounds like x86 is, by your own description of it, *exactly* a CISC CPU.

      I know after CISC was coined (in response to RISC) that nobody likes to admit they've got CISC, for some reason, but let's call a spade a spade here.

      In fact, CISC doesn't mean bad. Most CISC designs were awesome: System/360, VAX, PDP-11, and M68K all rocked.

  3. Re:I can tell you how by idiotwithastick · · Score: 2, Informative

    Driver problems on a processor compatible with current motherboards?

  4. More Useless Options by hattable · · Score: 5, Insightful

    "So we do not plan on adding SSE4 optimizations. We may use SSE4 instructions in the future for convenience once SSE4 has become really widely supported. But I personally don't see that anytime soon..."

    This just reminds me of CONFIG_ACPI_SLEEP. About 2 times a month I am staring at this option wondering if I will ever get to use it. Some things just are not worth developer time to implement.
    --
    OMG facts!
  5. If Intel had a better bus then they would not need by Joe+The+Dragon · · Score: 0, Offtopic

    If Intel had a better bus then they would not need so much L2

  6. Penguin by proudfoot · · Score: 1

    Am I the only one who read Penryn as penguin?

    1. Re:Penguin by smittyoneeach · · Score: 4, Funny

      You, and a certain "ronery" fellow in North Korea.

      --
      Get thee glass eyes, and, like a scurvy politician, seem to see things thou dost not.--King Lear
    2. Re:Penguin by apathy+maybe · · Score: 0, Redundant

      I read it as "porn"...

      I miss my GF.

      --
      I wank in the shower.
  7. Remember MMX ? by 1888bards · · Score: 4, Informative

    I seem to remember Intel doing this when they release the 'first' MMX instructions in the pentiums, that time they had actually doubled the L1 cache from 16k to 32k in the new pentiums, but they somehow managed to convince/fool everyone that the performance was as a result of MMX. Very sneaky/clever.

    1. Re:Remember MMX ? by virtualnz · · Score: 1

      MMX Only allowed extra Multi-media capabilities, so was still for the most part pretty useless on most machines, especially in a bus environment. More Intel marketing.

      --
      Look Forge | Free Classifieds Buy and Sell http://www.lookforge.com/
    2. Re:Remember MMX ? by xZgf6xHx2uhoAj9D · · Score: 1

      MMX was not useless. Despite its marketing name, it didn't have a whole lot to do with multimedia (though it did have obvious applications in multimedia). It was x86's initial introduction to vector/SIMD instructions. The ability to perform the same instruction to 4 numbers at once (rather than using a loop) was a huge boon. Intel might have marketed it strangely, but to some degree it was Intel playing catch-up to other architectures which had already added vector instructions.

      It's true though that we didn't really have vectorizing compilers back then (and we still don't have many good ones: vectorizing is hard) so any programs (mostly games) that used MMX would have had hand-coded assembly, which limited its use a bit. But still, MMX and later 3DNow! were and are quite useful.

  8. Re:I suppose this means we don't need to ask... by RuBLed · · Score: 0

    I'm contemplating on what Force power I should use on you.. let me see.. we have

    1. That's no moon = Force Choke
    2. But does it run Linux = hmmm....

  9. Being an Early Adopter Sucks by ocirs · · Score: 3, Insightful

    These guys are pretty much saying that they don't really intent to optimize the code for penryn because very few processors will have SSE4, and even then they don't expect much performance improvement. I'm still waiting for decent 64-bit drivers for half of my hardware........ most early adopters pay a premium for features that aren't really utilized at first, and by the time the software catches up the hardware is dirt cheap. However penryn(except for the extreme edition) is an exception since it is priced at a point where it is worth it to pay the extra buck or two for the extra features that are not going to have much impact till years later when the software catches up. I'm really looking forward to Nehalem though, the architecture update is going to bring significant improvement in performance without much to do with software optimization.

    1. Re:Being an Early Adopter Sucks by DAldredge · · Score: 1

      What hardware do you have that doesn't have 64 bit drivers?

    2. Re:Being an Early Adopter Sucks by ArcherB · · Score: 1

      What hardware do you have that doesn't have 64 bit drivers?

      Adobe Flash and Opera.

      OK, it's not hardware or even drivers, but it's enough to make me regret installing 64-bit Ubuntu.

      --
      There is no "I disagree" mod for a reason. Flamebait, Troll, and Overrated are not substitutes.
    3. Re:Being an Early Adopter Sucks by Anonymous Coward · · Score: 0

      Google Ubuntu 32 bit chroot. For the speed benefits of 64 bit, it ends up being worth the trouble IMHO

    4. Re:Being an Early Adopter Sucks by Anonymous Coward · · Score: 0

      32-bit opera installs just fine on 64-bit ubuntu. Flash works just fine in 64-bit firefox on 64-bit ubuntu. Java requires a little work but it is till doable under 64-bit firefox. Gutsy will even ask if you want to install flash when you visit a site that uses it (and installs it successfully, I might add).

    5. Re:Being an Early Adopter Sucks by Anonymous Coward · · Score: 0

      netgears wireless pci cards
      huawei modems have non working drivers

    6. Re:Being an Early Adopter Sucks by ArcherB · · Score: 1
      32-bit opera installs just fine on 64-bit ubuntu. Flash works just fine in 64-bit firefox on 64-bit ubuntu. Java requires a little work but it is till doable under 64-bit firefox. Gutsy will even ask if you want to install flash when you visit a site that uses it (and installs it successfully, I might add).

      I'm still running swiftfire or swiftfox or whatever that allowed it. I haven't tried Opera since upgrading to 7.10, but just googling "opera 64-bit ubuntu 7.10" Take this site for exampe:

      Be aware before you install the 64bit version that you will not be able to install Flash, Opera, Wine, Komodo Edit, or any of the new cool Adobe Air products. Boy this is got me where it hurts being a web developer. Now none the less, most of these can be installed by following the tutorials for installing on a 64bit machine, but what I would really love to see in future versions is by default, Ubuntu have the capability of installing and running 32 and 64 bit versions of software. Now I've got no clue how one would begin creating such a work of art, but Apple did it, and I have full faith in the Ubuntu community. There are no tutorials or scripts that need to be run for the 32-bit version; It just works.

      I still have not found a way to get freenx server running on it, but I have not looked in a couple of weeks. I'm sure there is a way, but VNC is working for now.

      --
      There is no "I disagree" mod for a reason. Flamebait, Troll, and Overrated are not substitutes.
    7. Re:Being an Early Adopter Sucks by ocirs · · Score: 1

      Emphasis on DECENT. My nVidia 7900GT driver still doesn't let me have a dual monitor setup without completely messing up the display on one screen, and it's STILL not fixed yet. The creative sound drivers suck badly, some of my games still can't utilize the 3d features of the sound card, and the sound quality is crap, I was better off using my onboard sound card. There wasn't a driver for my dlink 530tx gigabit card for a really long time, there's a driver out now but I can't access half the features like remote wake and such. And then there are all these accessories such as my iogear skype conferencing mic that will probably never work on x64 simply because the manufacturer is small and underfunded.

  10. Re:If Intel had a better bus then they would not n by Predius · · Score: 5, Insightful

    Unless the bus and ram start running faster than the cpu, cache will have place in the design. And when die space is as cheap as it is for Intel now, why NOT use it for more cache?

  11. Re:Man thats fast by Anonymous Coward · · Score: 0

    dude, are you like 13 years old or what?

  12. sse4 by andreyvul · · Score: 1

    Intel bothered to add crc-32 but they didn't add md5 or sha-1: wtf?
    seriously, sha-1 in microcode would be hella fast

    --
    proud caffeine whore
    1. Re:sse4 by Anonymous Coward · · Score: 1, Informative

      md5 and sha1 are already compromised algorithms. crc32 is not supposed to be obsoleted and discarded by a discovery and research.

    2. Re:sse4 by vidarh · · Score: 1

      For most uses of md5 and sha1 in modern applications it makes very little difference that it's possible to manufacture collisions. That said, I'd really like to know what kind of applications requires enough md5 and sha1 generation steps that it causing enough load to be worth dedicated instructions.

  13. Via has it for years: AES, SHA, and much more... by rpp3po · · Score: 3, Insightful

    My 1,2 Ghz (C7) Epia Board runs a 28Mbyte/s file server over Gigabit LAN - with transparent AES decryption (dm-crypt).... :)

  14. Apple's change to LLVM by tyrione · · Score: 4, Interesting

    makes a lot more sense with these latest processors. Sure the SSE 4 instructions won't be that immediately useful to Linux. They sure as hell will be for OS X Leopard.

  15. Does it blend? by Anonymous Coward · · Score: 0

    But does the new Intel chip blend?
    That is the question.

  16. Re:If Intel had a better bus then they would not n by Kyojin · · Score: 1

    Actually, unless the bus and ram run faster than the cpu or the cache memory , cache will have a place in the design. If ram and bus are faster than cache, there is no point having cache. If ram and bus are faster than cpu speed, your cpu is too slow and you'll get no benefit from having such a fast ram and bus speed.

  17. Re:I can tell you how by renegadesx · · Score: 1

    2007 called, you have been invited to join in.

    --
    Make SELinux enforcing again!
  18. x86 not CISC?! by porpnorber · · Score: 5, Interesting

    x86 has a hella complex instruction set, and it's decoded in hardware, not software. On a computer. So: it's a CISC. A matter of English, sorry, not religion. Sure the execution method is not the ancient textbook in-order single-level fully microcoded strategy - but it wasn't on a VAX, either, so you can't weasel out of it that way. ;)

    Of course, the problem isn't with being a CISC, anyway. Complex instruction sets can save on external fetch bandwidth, and they can be fun, too! It was true 25 years ago, and it's still true now. CISC was never criticised as inherently bad, just as a poor engineering tradeoff, or perhaps a philosophy resulting in such poor tradeoffs.

    The real point is twofold, and this: first, that the resources, however small, expended on emulating (no longer very thoroughly) the ancient 8086 are clearly ill-spent. While this may have come about incrementally, it could all by now be done in software for less. And second, while don't write assembly code any more, we do still need machines as compiler targets; and a compiler either wants an ISA that is simple enough to model in detail (the classic RISC theory) and/or orthogonal enough to exploit thoroughly (the CISC theory). Intel (and AMD, too, of course; the 64 bit mode is baffling in its baroque design) gives us neither; x86 is simply not a plausible compiler target. It never was, and it's getting worse and worse. And that is precisely why new instructions are not taken up rapidly: we can't just add three lines to the table in the compiler and have it work, as we should be able to do; we can't just automatically generate and ship fat binaries that exploit new capabilities where they provide for faster code, as must be possible for these instruction set increments to be worthwhile.

    Consider, for example, a hypothetical machine in which there are a number of identical, wide registers, each of which can be split into lanes of any power of two width; and an orthogonal set of cleanly encoded instructions that apply to those registers. CISCy, yes, but also a nice target that we can write a clean, flexible, extensible compiler back end for. Why can't we have that, instead? (Even as a frikkin' mode if back compatibility is all and silicon is free, as you appear to argue!)

    It shouldn't be a question of arguing how hard it is or isn't for the Intel engineers to add new clever cruft to the old dumb cruft, but one of what it takes to deploy a feature end-to-end, from high level language source to operations executed, and how to streamline that process.

    So, sure, give us successive extensions to the general-purpose hardware, but give them to us in a form that can actually be used, not merely as techno-marketroids' checklist features!

    1. Re:x86 not CISC?! by shadanan · · Score: 0

      Everything you say is true, however, the GP is also correct. The x86 chips have a RISC architecture below the CISC "interface". CISC instructions are mapped to a lower level microcode. It has been this way at least since the Pentium processors. MMX opcode for instance is an extension to the CISC instruction set which maps directly to Intel's RISC microcode. From your point of view, the fact that there is an underlying RISC architecture is irrelevant since you were talking about the compiler preparing code for the processor which is transparent to Intel's RISC microcode. Nevertheless, the advantages of having a RISC architecture from the hardware design point of view are also realized by Intel's approach. It's effectively an abstraction, similar to a virtual machine.

    2. Re:x86 not CISC?! by Jay+L · · Score: 1

      Thanks for the explanation. I've been wondering why, in late 2007, everything I see is still optimized for i686 (or even i586). I upgraded from Core to Core2, and couldn't figure out why I didn't need to recompile everything to take full advantage; that's when I noticed that Tiger was still using a gcc that didn't even have Core!

      It sounds like Penryn has a bunch of slightly-neat features that we'll start taking advantage of sometime in 2025.

    3. Re:x86 not CISC?! by porpnorber · · Score: 1

      I'm afraid I find this a little hard to interpret. It's a traditional thing to have flamewars about the 'point' of RISC (simply because - we're back in history here - the argument of H&P was 'measure twice, cut once' and RISC - an object, a project and a design schema - was just an application of that philosophy at a particular point in the technology curve), but the idea of RISC was most specifically to tune the visible ISA to the needs of a compiler back end (along with the execution engine, technological constraints, etc.) because the virtualisation layer that translates the source language into execution steps is better done in software than in hardware. Why? Because (a) in the modern world the source code is in a high level language and not in assembler, so optimising for assembler has no point; (b) the translation does not change once the source code and the target processor are fixed, and so can be done statically (just in time compilers actually make this false, as written; with the right hardware support they should substantially outperform static compilers on many tasks. But they actually strengthen my argument in the end, because they are more sensitive to unnnecessary ISA complexity than classical compilers, since they themselves need to be faster and lower complexity pieces of software); and (c) because the amount of end-user computation they do per gate per cycle approaches zero.

      You say, well, but the clever thing about x86 is that behind the hardware translator there is a RISC core. Well, yeah, and you can think of classical microcoded machines that way, too, and it would be a darling method to implement a VAX, as well. The point is, rather, that it is not in any sense a RISC until that translation layer, redoing on every execution a nearly static translation, is moved into software, because that was the point.

      You know, the x86 even has protected code segments. They could put a binary translator in firmware, have an entirely new ISA without a legacy decoder at all, and still be binary compatible. I find it really hard to understand why things are still being done with explicit layers in hardware. Of course, and relatedly, I also find it hard to understand why Intel don't give their compilers away for free, since they are a hardware company and the compilers are, let's face it, just the device drivers for their product....

    4. Re:x86 not CISC?! by Neil+Hodges · · Score: 1

      Heck, for my Core 2 Duo, I need to use the -march=nocona flag for compilation, and if I can recall correctly, that was originally added for the Prescott or similar: "Improved version of Intel Pentium4 CPU with 64-bit extensions, MMX, SSE, SSE2 and SSE3 instruction set support." Though, I doubt the software interface to the CPU is that different.

      - Neil

    5. Re:x86 not CISC?! by Anonymous Coward · · Score: 0

      Thanks for the explanation. I've been wondering why, in late 2007, everything I see is still optimized for i686 (or even i586). I upgraded from Core to Core2, and couldn't figure out why I didn't need to recompile everything to take full advantage; that's when I noticed that Tiger was still using a gcc that didn't even have Core!

      Don't thank him -- his explanation was deeply misleading in a lot of ways. In particular, lines like:

      new instructions are not taken up rapidly: we can't just add three lines to the table in the compiler and have it work, as we should be able to do; we can't just automatically generate and ship fat binaries that exploit new capabilities where they provide for faster code, as must be possible for these instruction set increments to be worthwhile.

      betray a deep misunderstanding of the issues involved. Here's some things wrong with that claim:

      1. You actually can just add table entries and get a minimal level of support (the ability to use the instructions in inline assembly) in the compiler. Many people learned the SSE4 opcode names months ahead of Intel's release of documentation because they popped up in GCC header files due to some of Intel's contributions to GCC. The parent poster seems to think this is an amazingly difficult task when in fact it's quite simple. It's not hard at all to make a backend able to output new x86 opcodes.

      2. You cannot just add lines to a table in the compiler backend and suddenly get truly new capabilities. The hard part in supporting a genuinely new instruction (i.e. one which does something fundamentally different from an existing instruction) in a compiler is the semantic mapping from a line of high level language code to low level machine instructions. There is no instruction set architecture in existence which makes this as easy as adding table entries! It's simply not possible to do that because it has nothing to do with the instruction set and everything to do with how compilers work.

      3. The new instructions being added to x86 are almost always extensions of SSE, and typically are targeted at very specific media applications. For example, one of the new SSE4 instructions is 'sum of absolute differences', a math operation useful mainly to motion detection algorithms in video encoders. It's never going to be much use outside software which is (a) always written in hand-optimized assembly and (b) is typically made available as a systemwide library. The number of programs which would benefit from making GCC automatically emit these instructions is extremely small; that's the true reason why it isn't done, not because it's somehow incredibly hard just because it's x86.

      By the way, you don't need to recompile everything to take full advantage of Core/Core2. Intel did such a good job that in practice there's little point in optimizing specifically for those families. Mostly, with them, -Os (optimize for size) is the best choice: it reduces the cache footprint of the code. Unlike a lot of RISCs (and some x86 designs like the P4), you can rely on the CPU to do a damn good job of dynamically scheduling your code at runtime.

    6. Re:x86 not CISC?! by ealar+dlanvuli · · Score: 1

      Uhm, did you just claim that sum of absolute differences is a pointless instruction? ...

      Sean

      --
      I live in a giant bucket.
  19. LLVM == Hot Air by Anonymous Coward · · Score: 1, Interesting

    Since 2002 we've been hearing about LLVM. It sounds as something generally "vaguely very cool TM", so when you download it it's a bunch of "optimization strategies framework, etc, etc". Still nothing practical though. When you can compile the kernel with LLVM, it works and it is not 50 times slower than gcc, wake me up. It seems that in the best case scenario LLVM would require at least 7 more years of heavy development before it gets there, if ever at all. If you want to invest in hot airs and arrange your future decissions on the existance of this hypeware, fine. Yawn.

    1. Re:LLVM == Hot Air by pavon · · Score: 4, Informative

      Apple used LLVM to improve the performance of software-fallbacks for OpenGL extensions by a hundred fold in Leopard, and the big part of that was because it was good at optimizing high-level routines depending on the low-level features of the chip, such as Altivec/SSE2 32bit/64bit, PPC/x86 etc. So it stands to reason that, to the extent that SSE4 is usefull, LLVM will make good use of it, just like it did for other extensions.

      That sounds pretty practical to me.

    2. Re:LLVM == Hot Air by Slashcrap · · Score: 1

      Apple used LLVM to improve the performance of software-fallbacks for OpenGL extensions by a hundred fold in Leopard, and the big part of that was because it was good at optimizing high-level routines depending on the low-level features of the chip, such as Altivec/SSE2 32bit/64bit, PPC/x86 etc. So it stands to reason that, to the extent that SSE4 is usefull, LLVM will make good use of it, just like it did for other extensions. If a new compiler frontend/backend/whatever improved the performance of those routines 100x, it's because the original routines were horribly inefficient. That is a simple fact and it is still true even when Apple is involved.
    3. Re:LLVM == Hot Air by tyrione · · Score: 1

      If a new compiler frontend/backend/whatever improved the performance of those routines 100x, it's because the original routines were horribly inefficient. That is a simple fact and it is still true even when Apple is involved.
      Apple is driving the costs behind LLVM. They are accelerating it's development goals and no GCC was not capable of providing the improvements to OpenGL and Quartz that Apple needed.

      Apple buys CUPS and makes sure it remains the same licensing while paying the salaries of its development staff and also extending it's Apple specific functionality for OS X Leopard.

      I'd read up on what has changed to LLVM 2.0 and future releases, LLVM 2.2 Feb 2008 than to compare it to the LLVM branch available in, say, Debian (LLVM 1.8b-1) which is quite old.
  20. Who gives a shit? by Sycraft-fu · · Score: 4, Insightful

    Seriously, I get tired of the AMD fanboy "Well if Intel did this they wouldn't have to do that," or "Intel is cheating by doing processors this way instead of that way." So understand this: None of that shit matters. The only thing that matters to the end user is performance for the dollars. That's it. You can bitch and scream all you like about how doing things a different way is theoretically better, what matter is actual, real performance. In that category, the Core 2 is very good. It's a damn fast chip for a good price. That's all it needs to be. I don't care about pissing matches over how it is done, only that in the end it works well for the things I do. Doesn't matter if there's a theoretical situation it's bad at, if that's not one I encounter, I don't care.

    Also as for bus speed, you might note that the real limiting factor is RAM speed. It is pricey to get faster RAM, and that's ultimately where you've got to go for non-cached data. You can build as fast a bus as you like, if you are waiting on the RAM it gains you little.

  21. Re:If Intel had a better bus then they would not n by Anonymous Coward · · Score: 0

    The returns from increasing cache size diminish very quickly. Cache hits will be very close together. Cache misses will be very far away. With this model, adding another couple of MB does not gain you anything.

  22. Re:Via has it for years: AES, SHA, and much more.. by InsaneProcessor · · Score: 1

    You forgot to include "at a fraction of the price that any Intel equivalent (Oh, I am sorry, Intel can't do SHA or AES that fast) would cost.".

    --

    Athiesm is a religion like not collecting stamps is a hobby.
  23. All That Matters Is... by Nom+du+Keyboard · · Score: 1
    All that matters is, does it run Linux?

    It does, end of discussion. Everything else is simply about applications.

    --
    "It's the height of ridiculousness to say for those 9 lines you get hundreds of millions."
  24. Then why iNTEL? by Joseph_Daniel_Zukige · · Score: 1

    If you want orthogonal, why not use an existing non-intel CPU?

    Part of the problem is that we (still) don't really know how to design a CPU that is easy to compile fast code for (e. g., in all situations).

    1. Re:Then why iNTEL? by porpnorber · · Score: 1

      Intel wins in the market partly because it rides on Microsoft's coat tails (why Microsoft wins in the market is another long story, of course), and partly because it has fabrication technology that is actually sufficiently better than the competition to dominate most negative effects of architectural decisions. That in turn is because of economies of scale, and I understand was originally bootstrapped through their memory business, rather than by CPUs as such. And if it wins so utterly in the market, then it is very hard indeed not to be dragged along. Intel itself actively suffers from this; non-x86 Intel offerings have rarely fared as well as Intel hoped.

      So yes, of course, compilation is an inherently hard problem. But that doesn't make x86 a good approach, any more than the fact that we don't yet have a complete understanding of the fundamentals of physics justifies a return to a cosmology of crystal spheres. Whatever the right technical solution is, we can be certain that it isn't x86. The Intel architecture dominates for other reasons entirely.

    2. Re:Then why iNTEL? by jwo7777777 · · Score: 1

      momentum