Slashdot Mirror


Hyper-Threading Explained And Benchmarked

John Martin writes "2CPU.com has posted an updated article about Hyper-threading performance. They discuss the technology behind it, provide benchmarks, and make observations on what the future holds for hyper-threading. It's actually an easy, interesting read. Of note, they'll be publishing Part II in the near future which will detail hyper-threading performance under Linux 2.6. Hardware geeks will probably appreciate this."

10 of 245 comments (clear)

  1. Capsule summary. by Anonymous Coward · · Score: 1, Insightful

    Hyperthreading helps increase efficiency when applications are coded for it and it is enabled. As better caches and busses get built into future CPUs, hyperthreading will also get better.

  2. Celery by Chris+Siegler · · Score: 4, Insightful
    We saw a whopping 30% decrease in encoding time with HT enabled on the 3.2GHz P4C. We were using an application that is certainly multi-threaded in TMPGEnc, so each logical processor had plenty of work to do and they both had plenty of bandwidth available to share.

    That's pretty cool, but if your primary concern is encoding, then there are some things to keep in mind. A Celeron is much cheaper than a P4 with the hyperthreading ($90 for a 2.6GHz Celeron, and $170 for a P4 2.6C). And if the app you're using doesn't support HT, then a Celery will likely encode faster than a P4 with HT on. HT can also reveal nasty bugs in some drivers (my HDTV card is an example). So unless you're playing games, the P4 is just added expense.

    1. Re:Celery by Glonoinha · · Score: 2, Insightful

      $80 difference on a $700 machine (assumes a usable amount of RAM, a real video card, a usable performance hard drive, and a legit copy of XP Pro (XP Pro gives you the best performance on the SMT chips, I have seen roughly 5%-10% gains)) means that for every 8 P4 2.6GHz HT machines you were going to buy, you can buy 9 Celeron 2.6GHz machines. Even if you go display-less (no monitors) and use a free OS (Linux or recycled Win2000Pro CDs) you are talking $500 absolute minimum, you are talking 7 Celeron boxes for the same price as 6 P4 boxes. I don't think my honey is going to fall for the 'but I need another 7 computers' line again this year.

      At $80 difference, I don't see the price difference being worth it. Particularly given a two year lifespan wherein apps will be developed to get that 30% performance boost we see in a few of the charts (ie, the programs that are multithreaded, and SMT friendly.)

      Then again if we applied the $80 towards another half gig of memory, tested same price boxes but the Celeron had another 512M of RAM ... I can see the Celeron simply dominating the P4.

      --
      Glonoinha the MebiByte Slayer
  3. Re:Ever buy a car with auto-everything? by Dominic_Mazzoni · · Score: 5, Insightful

    Whether it's something obvious like the Pentium off by 1+1=1.9999943 error

    The Pentium math bug was with division, not addition, and it only occurred in very specific circumstances. So while it supports your general point that complicated systems are more difficult to debug, that wasn't a very good example of an "obvious" bug. Careless, yes.

    One thing that was good for the industry was to move away from the complex instruction set (CISC) towards a reduced set of instructions (RISC), and we have seen the speed improvements as well as a general reduction in hardware bugs since that time.

    You do realize that Intel x86 processors are still CISC, right? (OK, actually internally they do execute things very much like a RISC chip, but the instruction set is still CISC, and modern x86 processors are certainly not any _simpler_ for having some RISC-like elements to them.

    Besides, RISC chips don't actually have fewer instructions. Most of them these days have more. The difference between CISC and RISC is that RISC chips don't have certain complicated, slow instructions, but rather break these up into smaller pieces. For example, CISC processors usually have an instruction to move memory-to-memory while RISC only moves memory-to-register and register-to-memory. Also, CISC processors often have a division instruction while many RISC processors instead just have a multiplicitive inverse instruction (so to compute a/b you instead compute a*inv(b)).

    But to add Hyperthreading, an untested and unproven technology which can guarantee no more than a 12% speed improvement, is folly. Better to amp the CPU clock and deal with a known like heat than to risk your company's livelihood on letting the CPU figure out which thread is which. That is something an OS is much more reliable in handling.

    Now that's just ridiculous. Hyperthreading is not untested or unproven. Similar ideas have been discussed in academic papers for years; Intel was just the first to put it into a modern CPU. It's hardly untested, either - Intel started seeding the first Hyperthreading-capable processors what, two years ago now? At that point I wouldn't have suggested running a mission-critical application on a machine with Hyperthreading enabled, but now? You'd be crazy not to if it actually speeds up the application you need to run.

    The reality is that in order to advance the speed of computer processors, it's necessary to make them more complicated.

  4. Re:Future prognosis for HT by sql*kitten · · Score: 4, Insightful

    Unfortunately, historically CPU speed has increased faster than memory bandwidth. That's why we've had ever more layers of cache added to our systems, to make up for the relative deficiency.

    Aye. Sun has big plans for CMT, which one of their sales reps was quick to tell us all about, up to 32 SPARC cores on one chip. That'll work well in the lots-of-small-tasks model where you can take advantage of direct access (say between disk cache and network card) on FirePlane with very simple code (like a webserver) that can execute out of the processor's cache. But we're heavy database users, and the first question he got asked was, are you seriously telling us Sun is about to makes its memory bandwith an order of magnitude greater? He couldn't answer that question. Now, that means either he was clueless, or Sun is jumping on the Intel benchmark bandwagon.

  5. Re:RISC gives you more bang for your buck by imsabbel · · Score: 2, Insightful

    And nowadays it becomes more and more clear that there isnt much of an advantage anymore.
    All "Cisc" chips are risc cores with a decoder frontend, and the "cheaply developed" Power PCs before the G5 were slaughtered by X86 in any bench but photoshop gaussian blur.

    And the G5 is only a sideproduct from IBMs Power4 program, which cant really be descriped with "low R&D expenses".

    --
    HI O WISE PRINCE. WHT TOOK U SO DAM LONG?
  6. Re:Quick Q by renoX · · Score: 4, Insightful

    > Why would you want to have a virtual double processor when... you can actually get a second one?

    Because it is cheaper?
    SMT increase very little the size of the CPU and can give some good improvements (depending of the application, and the OS as said in the article).

    SMT can work in the same motherboard as a single CPU contrary as what you said..

    And for the same price, the single CPU performance of your dual-CPU setup will be lower..

  7. The thing that got me about CPU performance by awol · · Score: 4, Insightful

    I did comp sci (undergrad) in the days when we used unix/VMS to learn and so I have a pretty good understanding of architecture and the basics of threads and processes. The one thing that never sat well with me was that as processor speed "exploded" in the last 5 years, I was under the impression that a "lot" of the performance increase was achieved by parallelising stuff in the execution core. (You can see that my knowledge is _limited_) So as a result unless your applications could somehow take advantage of this parallelism a given bit of code would never really get the full benefit of todays uber processors. So all the speed gains were only really marginal improvements.

    I think the advent of SMT confirms that it is indeed the case that a given process cannot of itself (unless it is _real_ special) take full advantage of a modern processor and so SMT is a way of reducing the problem by assuming that whilst one process aint enough to take full advantage, two processes are able to make more advantage. It sure makes sense to me.

    But it also presents the very interesting question of the marginal benefit of execution pipelines compared to complexity in the front end to allow SMT. What I mean is, what are the trade offs between having a "virtual" (for want of a better word) processor for each execution pipepline rather than using them to out of order execute parts of a single stream of instructions. Is it simply a question of the nature of the work being undertaken my the machine? Ie a processor with 8 pipelines serving 20 users doing stuff, would it be better doing 1 bis of work from each of 8 users or maybe 2-4 bits of stuff from 4-2 users. And can we answer that question heuristically to allow the front end to make good use of each pipeline with a variable profile over the chaing use of the machine. Fascinating (well to me anyway).

    --
    "The first thing to do when you find yourself in a hole is stop digging."
  8. Re:SMT by sql*kitten · · Score: 2, Insightful

    most of the code I have seen churned out at software companies was done in such a rush because of deadlines the programmers didn't have time to optimize there code.

    I would argue that in the vast majority of cases, processor-specific microcode (as opposed to language and algorithmic) optimizations aren't the programmer's job - that's what a compiler is for. A professional-grade compiler like MIPSpro or ICC can generate code over twice as fast as GCC on the same processor, because it's smarter about processor-specifics. It's the same as on processors with OOOE and the like; the onus is on the compiler writers working with the hardware designers. On an older architecture like VAX, there was less need for that because the instruction set was so rich, but a more modern architecture like MIPS really needs it.

  9. Re:HT is awesome by Jeppe+Salvesen · · Score: 2, Insightful

    Absolutely. But Perl means we can produce more software with fewer manhours and fewer lines of code! Compared to our java-based competitors, we kick butt, both in terms of development team size and in terms of performance and TCO.

    We have profiled our code and optimized the code where we spend most of our time. On those critical sections, we use most of the tricks in the book - dynamically created code, extensive use of hashes, etc. We can even write functions in C using XS if we want to!

    Basically, Perl is about freedom. You get a high-level language with a lot of freedom to both do genius and very dumb things. And then you can write (or have someone write) C code for those truly performance-critical functions.

    Perl looks ugly and looks hacky. I'll be the first to admit it. But once you figure it out, it's pretty damned powerful.

    Anyhow - would you have learned this if you didn't ask? Keep attempting to offend, man :)

    --

    Stop the brainwash