Slashdot Mirror


Intel to Increase Stages in Prescott

Alizarin Erythrosin writes "Further contributing to the MHz Myth, The Register and ZDNet are reporting that the new P4 core, codenamed Prescott, will have a longer pipeline then Northwood. No official numbers have been released, but The Reg is saying an Intel spokesman said that 30 stages seems to be a reasonable estimate. As most of us know, a longer pipeline can lead to slowdowns in the form of branch mispredictions and pipeline stalls. 'And just as the PIII proved faster than the early P4s in some applications, it's likely that Northwood will similarly prove faster than Prescott, which has clearly been designed for speeds of the order of 4GHz.'"

117 of 524 comments (clear)

  1. Holy pipelines by Breakfast+Pants · · Score: 3, Funny

    With all these pipelines you'd think intel was Bush and Prescott was Afghanistan.

    --

    --

    WHO ATE MY BREAKFAST PANTS?
    1. Re:Holy pipelines by k4_pacific · · Score: 5, Funny

      Recall that GW Bush's grandfather was Prescott Bush.

      --
      Unknown host pong.
    2. Re:Holy pipelines by Breakfast+Pants · · Score: 2, Informative

      There may be no oil in Afghanistan, thats why I didn't say oil wells. Also, you must have not heard about the huge plans for a giant ass pipeline that will pass right through Afghanistan.

      --

      --

      WHO ATE MY BREAKFAST PANTS?
    3. Re:Holy pipelines by wwest4 · · Score: 3, Offtopic

      it's still on the agenda.

      a trans-afghan pipeline has been encouraged by the us for years preceding the latest invasion of the country. it may never be built, but it is still being pushed by the US. There has been news trickling in fairly steadily in the past two months about this. eg from times of india jan 12

      the kazakhs HAVE a good deal of oil/gas - it needs to get south and west. maybe you're referring to the BTC pipeline project that replaced the first trans-afghan pipeline plan.

      the idea put forth by the "conspiracy nuts" is that the US had an interest in occupying the region because their presence means they can fund and participate in the installation of new export infrastructure (like the BTC, in which US-based Unocal is involved). The war in Afghanistan meant bases in neighboring countries like Uzbekistan, Pakistan, and Kyrgyzstan, which allows for a permanent regional presence.

      it doesn't really matter where the pipeline runs -the US couldn't have participated as easily if it hadn't established a presence.

      maybe the conspiracy nuts should hold off on the apologies after unocal donates a portion of their profits to the poverty and war-stricken afghan people, or towards the all-too-modest $160 million reconstruction plan. (To put this into perspective: this doesn't even approach the size of the defecits some of the US' state budgets run).

    4. Re:Holy pipelines by mikeabbott420 · · Score: 3, Interesting

      Could we explain to people the differance between megahertz and performance by comparing it to cars? Sure the intel xxx does yyy but thats a 4 (IPC) cylinder that does yyy rpm vs a a 8 (IPC) that does zzz rpm but more horsepower. megahertz=rpm ips=horsepower if the general public understood that megahertz was rpm not horsepower intels talented engineers could build great things freed from the marketing departments focus on rpm

      --
      This program was made possible by a grant from the Ultra-Humanite, and viewers like you.
  2. Bang for your buck by ObviousGuy · · Score: 5, Funny

    Northwood was really unsatisfying. I found that for the money, it was too short with too few stages. While gameplay was fine, the lack of stages simply made the cost not worth it for me.

    2 stars.

    --
    I have been pwned because my /. password was too easy to guess.
    1. Re:Bang for your buck by johnnorthwood · · Score: 3, Funny

      Hey... I have had many ladies say i was satifying.

    2. Re:Bang for your buck by Anonymous Coward · · Score: 2, Funny

      Yup, you served them food quickly and got their order right every time. That's not a small feat for a fast food worker though.. You wear that employee of the week badge with honour.

  3. Size of pipeline by odeee · · Score: 4, Funny

    It's not the size of your pipeline that counts... its how you use it.

    1. Re:Size of pipeline by sfraggle · · Score: 2, Funny

      I hear Prescott packs quite a punch.

      --
      were you expecting to see a sig here? perhaps you'd rather see the inside of an ambulance!
    2. Re:Size of pipeline by Hoser+McMoose · · Score: 4, Interesting

      Ironically enough, that's quite accurate for processors!

      A 6-stage pipeline with terrible branch prediction and all sorts of holes in it isn't going to do any good at all, while a 30 stage pipeline with great branch prediction (and the P4 does have great branch prediction) and few bubbles or holes (improved SMT, aka hyperthreading, is supposed to help here) will do wonders.

      Of course, the real question is now how long the total pipeline is, but the branch mispredict penalty. It should be noted that the "Northwood" P4 has a 28-stage pipeline, but only a 20-stage mispredict penalty. If the "Prescott" has a 30-stage pipeline with a 22-stage mispredict penalty, it isn't exactly a huge change.

  4. I guess the home market rules... by ghostis · · Score: 4, Interesting

    I work at an engineering firm. The deep pipelines in the current P4 perform so poorly with general number crunching (e.g. matlab) we have almost completely switched to Athlons and are seriously considering Opteron.

    -ghostis

    --


    Computer Science is all about trying to find the right wrench to bang in the right screw. -T.Cumbo?
    1. Re:I guess the home market rules... by LehiNephi · · Score: 4, Insightful

      I see this as a huge opportunity for AMD. They rate their processors based on how many times faster than a Duron 1 GHz runs. Thus, an AthlonXP3000+ runs three times as fast.

      However, Intel rates their chips by clockspeed, and with the less-efficient pipeline, a 3 GHz P4 is not three times as fast as a 1GHz P3.

      Thus, as chips get faster, AMD's chips will get better performance, not only cycle-for-cycle, but even rating-for-rating!

      --
      Help find a cure for cancer. Join the [H]orde
    2. Re:I guess the home market rules... by Aardpig · · Score: 4, Insightful

      If you were to use SSE2 you would see an incredible performance boost.

      I doubt it, I really do. Present-day x86 chips aren't limited by their FP processing speed, the real problem is memory latency and bandwidth. For instance, my 1.8 GHz P4 regularly performs in excess of 1 Gflops when running benchmark tests for the ATLAS BLAS. However, these benchmarks are specifically designed to fit in cache, to have predictable branching, etc etc.

      Unfortunately, in real-world situations cache thrashing is difficult to avoid, and accurate branch prediction is a highly non-trivial affair. When a prediction turns out to be wrong, the cost of refilling a stalled pipeline increases in proportion to the pipeline length. The ever-lengthening pipelines of P4 chips means that, although its FP performance may r0x0r, the overhead of stalls makes production code run like treacle.

      --
      Tubal-Cain smokes the white owl.
    3. Re:I guess the home market rules... by EulerX07 · · Score: 3, Interesting

      Matlab can hardly be beat in speed when you need to produce custom software to crunch huges matrices full of number. You can have a GUI designed, working, put some code quickly together that can grab data from any txt format, run mathematical formulas on those data. Then you can do any operations you want on the matrices that are in memory and easily accessible. Want to throw your data into a chart? A few minutes of coding and you've got the perfect chart on there.

      Back in my days of internship at the canadian space agency, I'd program multiple custom apps to pre-process the data before it being fed to the mainframes of a contractor for finite element analysis. Matlab is the tool to use for anybody involved in scientific projects. Yes, your code in C will run much faster, but it'll take significantly longer to get it up and running.

      If you run a lot of loops and it's really bogging the performance down, you can program just those sections of code in C and compile with matlab libraries to be able to use it in Matlab like the native commands. I did one piece of code that took a finite element file and created the 3d model in matlab. Took 20 minutes to run the code in matlab, 3.45 seconds once I had compiled the tough part of the code in C.

      In the end it's all about using the right tool, and for engineering/matlab, Matlab is excellent.

    4. Re:I guess the home market rules... by Mr.+Frilly · · Score: 2, Interesting

      Just another (single) data point to add, for the image reconstruction software I use routinely, I get these performances:

      intel pentium IV, 3.2 GHz: 5.0 minutes
      athlon XP, 1.533 GHz: 5.7 minutes
      intel pentium III 733 MHz: 8.1 minutes

      From the PIII to the PIV, a 340% increase in processor speed, I get 60% increase in performance...

    5. Re:I guess the home market rules... by Cecil · · Score: 2, Interesting

      The deep pipelines in the P4 perform poorly, period. Even when running simple desktop apps on a Windows machine, I notice my P4-2.5GHz w/1GB RAM at work often jerks around or lags, while my Athlon 1900XP+ w/256MB RAM at home works like lightning. Obviously processor is not the whole story, but I think that under typical, multi-tasking usage, the deep pipelines are even more painful than benchmarks suggest.

      Disclaimer: I am not an EE, so I could very well be full of shit.

    6. Re:I guess the home market rules... by woodhouse · · Score: 2, Interesting

      Each to their own I suppose. I admit I don't have much experience with Matlab (I'm planning on keeping it that way). As a college project, we were told to use matlab for a computer vision task. I tried everything to optimise it, followed all the guidelines on vectorising code and not using loops, and eventually found that the only way to do it was to write the critical code in C, as you suggest (this improved the speed by a factor of 100). In the end, there was almost no advantage from having used matlab and I would have been better to just write the whole thing in C.

      What baffles me the most is that people use it for image processing, of all things. Surely if performance is important anywhere, it's here? It doesn't help that Matlab 6.5 runs on a Java back end.

    7. Re:I guess the home market rules... by timeOday · · Score: 4, Informative

      No, surely AMD will simply change their metric to match whatever Intel is putting out. IMHO there's no way AMD will label something 4000 when it's faster than a PV 4400. That defeats the *whole point* of not using the real clock speed in the first place.

    8. Re:I guess the home market rules... by Laser+Lou · · Score: 3, Insightful

      However, Intel rates their chips by clockspeed, and with the less-efficient pipeline, a 3 GHz P4 is not three times as fast as a 1GHz P3

      I don't have hard data on this, but doesn't the impact of the pipeline depend on how the software it runs is compiled? If the object code is compiled to reduce branches, the longer pipeline should drastically speed up processing. That would theoretically make a 3GHz P4 MORE than three times as fast as a 1GHz P3.

      --
      No data, no cry
    9. Re:I guess the home market rules... by buysse · · Score: 2, Interesting

      I thought that SSE and MMX both had significantly lower precision than standard IEEE floating point ops. If I'm wrong, please correct me, but if it is lower precision, it makes it useless for Real Work(tm).

      --
      -30-
    10. Re:I guess the home market rules... by be-fan · · Score: 2, Informative

      SSE does standard IEEE754 signel or double precision math. The Pentium 4's SSE2 unit (actually its FPU, but thats a detail) can handle 4 single-precision or 2 double-precision operations per cycle.

      --
      A deep unwavering belief is a sure sign you're missing something...
    11. Re:I guess the home market rules... by tomstdenis · · Score: 5, Interesting

      It isn't just branches though. For example, a 32x32=>64 multiplication on the P4 can take upto 14 cycles [iirc] whereas on the Athlon it's 6-cycles. So for example,

      MUL EAX,EBX [DIMMMM]
      ADD ECX,EAX [_D___IE]

      So in total takes seven cycles.

      The same code on the P4 would take at least 15 cycles. What's worse is consider

      MUL EAX,EBX [DIMMMM_]
      ADD ECX,EBX [_DIE___]
      INC ESI [_DIE___]
      DEC EBP [__DIE__]
      ADD EBX,EDX [__D__IE]

      Again this takes seven cycles. Specially since instruction 1 and 2 can go start in cycle two in pipes 1/2.

      Compare that to the P4 which only has two ALU pipes [one of which is now stalled for 14 cycles for the MUL to finish].

      Tom

      --
      Someday, I'll have a real sig.
    12. Re:I guess the home market rules... by tomstdenis · · Score: 2

      My second example is slightly off. It would be

      MUL EAX,EBX [DIMMMM__]
      ADD ECX,EBX [_DIE____]
      INC ESI [_DIE____]
      DEC EBP [____DIE_]
      ADD EBX,EDX [____D_IE]

      [use a fixed-width font to read that...] for eight cycles not seven.

      [Where D = decode, I = issue, E = execute]

      Tom

      --
      Someday, I'll have a real sig.
    13. Re:I guess the home market rules... by tomstdenis · · Score: 3, Informative

      MMX doesn't do FP [it's int only].

      Both SSE and 3DNOW use formats the normal FPU can read so I'd say it's standard [hint: you can assign an array of two well aligned floats to a 3dnow 64-bit word and use it].

      SSE supports both double/float precision [as another poster pointed out]. Heck even the Athlon supports SSE [though I wouldn't use it. Hint: SSE reg == 128-bits and the Athlon CPU can only perform upto 64-bits of read per cycle...]

      Tom

      --
      Someday, I'll have a real sig.
    14. Re:I guess the home market rules... by zenyu · · Score: 3, Informative

      I thought that SSE and MMX both had significantly lower precision than standard IEEE floating point ops. If I'm wrong, please correct me, but if it is lower precision, it makes it useless for Real Work(tm).

      It performs precise math by default. You can only use 32 or 64 bit floats, the "long double" 80 bit floats are not supported. But this often isn't a problem. You can also turn off denormals, and with interupts on bad math (divide-by-zero type stuff). Turning those off hasn't given me any performance boost, but I still consider these things features not bugs. There are some low precision operations available, but no compiler I know of uses them unless you ask for em. I do in some cases but then I know what I'm getting.

      A math person may give you a better answer than me. I'm a graphics person, a field where SSE2 is a godsend compared to the stack based floating point units that came before.

    15. Re:I guess the home market rules... by wmansir · · Score: 2, Insightful

      Don't you see, that is the entire point of moving to a longer pipeline: to inflate the MHZ.

      Intel don't care if a Prescott 4.0GHZ is twice as fast as a Pentium 4 2.0 GHZ. Just as a P4 2.0GHZ is not twice as fast as a PIII 1.0GHZ. They just want to get to 4.0GHZ.

      Intel doesn't care if AMD's 4000+ is actually faster than their 4000MHZ part, they just want to have a 4000MHZ part to market before AMD.

    16. Re:I guess the home market rules... by be-fan · · Score: 2, Informative

      The decode bandwidth is a single x86 instruction per clock, but that's not a huge problem because of the trace cache. The issue bandwidth is three u-Ops per cycle, but this isn't a huge limitation because the P4 is a relatively narrow architecture. Its only got two ALUs and two FPUs compared to an Athlons three ALUs and three FPUs.

      --
      A deep unwavering belief is a sure sign you're missing something...
    17. Re:I guess the home market rules... by Nurf · · Score: 2, Informative

      Note to mods: parent is clearly wrong. How did this get +5? As others have stated, the AMD rating is an estimation of how fast their processor is compared to an Intel Pentium 4 running at the PR speed in megahertz.

      No. You are clearly wrong. The PR rating is relative to an AMD Thunderbird Core. If you don't know what you are talking about, you should just shut up. Here is a link and here is another.

      Intel are shouting about megahertz because its all they have. For most real world applications (ie. Not encoding video) the Pentium 4 cores are abysmally inefficient. Anything that is branch heavy (such as a compiler, for example) is a complete nightmare for a P4.

      For that matter, I'm writing a video encoder in my spare time, and the AMD chips are still a better match for the sort of stuff I am doing.

      --
      ---
    18. Re:I guess the home market rules... by gjm11 · · Score: 5, Funny

      "DIMMMM / DIE / DIE / DIE / D_IE" ... You aren't an employee of Rambus Inc. by any chance?

    19. Re:I guess the home market rules... by JamesP · · Score: 2, Informative

      Present-day x86 chips aren't limited by their FP processing speed

      The problem with x87 is not speed. It uses an antiquate programming model, using a stack. So you have to shuffle things in the stack to make it work, and this takes a lot of time.

      SSE2, OTOH, is very easy and fast. 2 calculations at the same time, and in the format A+B=C

      --
      how long until /. fixes commenting on Chrome?
  5. History repeats itself..... by Selecter · · Score: 5, Interesting
    I guess Intel's short term game plan is to keep the Mhz game going yet again until they can get something going on the 64 bit front worth having.

    I suspect AMD and even Apple are going to shrink Intel's bragging rights in that same time frame unless Intel gets their act together. From AMD's recent earnings report it sure seems somebody is buying Athlon 64's.

    Intel blew it when they made the decision to let 32 bits ride for another 2 to 3 years. They look like old fuddy-duddys now. It's AMD and Apple via IBM thats has the cool shit.

    1. Re:History repeats itself..... by dpilot · · Score: 4, Insightful

      Intel has backed themselves into a bit of a corner, in the process of repeating history. With Itanium, they've proven that they're more concerned with their own strategies than they are with delivering solutions to their customers. But they've sunk so much money and image into Itanium that they can't back out, yet. No doubt there's someone inside the company, probably a wild duck, working on the right time to jump ship and how to spin it.

      In the meantime, Intel has the one-two bait and switch with P4-Celeron and the true P4. If they didn't have a TON of money and market clout, they'd be in big doo-doo right about now. As it is, AMD is the one in big doo-doo, not because they have the lesser product, but because of Intel's clout.

      Listen to any computer commercial, and they pretty much all have those 5 co-advertising tones at the end. That's monopoly power, that's market clout. (If I were in charge, the antitrust penalty would ratchet up every time those tones sounded.)

      Maybe Intel blew it, but they'll survive.

      --
      The living have better things to do than to continue hating the dead.
    2. Re:History repeats itself..... by Pieroxy · · Score: 4, Insightful

      Dude, it's the same with any innovation. You have to wait for the software to follow. Why are you making a big fuss out of it? When they introduced P4 with their new architecture, tests shown that it wasn't all that faster than a good old P3. Then compilers and software in general adapted and it became faster.

      Same with the P3, the P2, the Pentium, the 486, 386, 286 (Even though no one adapted to this shit) and the 086. So yes, history repeats itself, and it is for good (at least on this one).

    3. Re:History repeats itself..... by Jerf · · Score: 4, Insightful

      Maybe Intel blew it, but they'll survive.

      We don't want them to die. We want them to pass through it and come out an older and wiser company, less inclined to pull shit it has learned the hard way it can't get away with, no matter how big it is.

      Compare the IBM of 2004 to the IBM of 1984.

      If Intel were to "die", the resulting market would have lost the wisdom that Intel is likely to learn over the next couple of years, barring some technical miracle.

  6. So What ? by El+Cabri · · Score: 4, Interesting

    I'm kind of tired of the perpetual whining of armchair hardware designers. So the happy few, highly paid architects, 30 years-experience in the industry, hundred-published scientific papers at Intel decide that the next gen chip will have more stages and they have to be called morons ? How do you know better ? Hasn't intel produced the fastest chips on the market with each and every micro-architectural generation ? Long pipelines = costly branch mispredicts, whoooaah, you're so bright why don't YOU have the job leading the prescott team ? branches can be predicted. Long pipelines can improve throughput. Microprocessors are all about trade-offs. Let the pros do the work and go back playing Quake.

    1. Re:So What ? by fredmosby · · Score: 2, Insightful

      I agree with the argument you are trying to make. But it would probably work better if you were less condescending.

    2. Re:So What ? by addaon · · Score: 5, Insightful

      Right, Intel always has had the fastest chip, if you ignore things like Alpha, Athlon, Opteron, Power, PowerPC, and others.

      And of course, Intel's motivations are entirely performance, or at least price/performance, not marketing.

      The fact that every other company has chosen a different design decision and has made better chips as a result is just an illusion foisted on us by those who think there own thoughts.

      --

      I've had this sig for three days.
    3. Re:So What ? by afidel · · Score: 3, Insightful

      Intel's engineer's didn't decide the direction of the processor. The whole direction of Intel's desktop line has been controlled by marketing concerns since the initial stages of development on the P4. The engineers got to do as they wished with the Itanium but unfortunatly they went too far the other way and completely forgot about marketing concerns like running legacy code.

      --
      There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
    4. Re:So What ? by harlows_monkeys · · Score: 3, Informative
      Right, Intel always has had the fastest chip, if you ignore things like Alpha, Athlon, Opteron, Power, PowerPC, and others

      Intel P4 and Xeon beat 4 of the 5 you name on SPEC.

    5. Re:So What ? by stevesliva · · Score: 5, Funny
      I'm kind of tired of you armchair OS coders. So the happy few, highly paid Microsoft employees, 20 years experience in copying IBM, thousands of stock options in Redmond decide the next gen OS will have some wack FS and they have to be called morons? How do you know better? Hasn't Microsoft produced the best selling OS on the market for 15 years? Why don't YOU have the job leading the Longhorn team?

      Oh. Yeah... LINUX.

      Nevermind-- go back to writing the best OS there is.

      --
      Who do you get to be an expert to tell you something's not obvious? The least insightful person you can find? -J Roberts
    6. Re:So What ? by drinkypoo · · Score: 2, Interesting
      obviously branches cannot always be predicted, and intel has traditionally (not a long tradition, OoO is relatively new, but still) been poor at it. Witness the amazing slowness of the P4 compared to the P3, clock for clock. Some of those pipeline stages in the current P4 are already there for signal propagation, I suspect more of them in this core will be so-called "Drive" stages in which the CPU is doing nothing but waiting for signal propagation.

      Intel has the fastest chips (by a fine RCH), but AMD has consistently produced the best price:performance ratio and since the K6 faded over the horizon, AMD has got its act together WRT chipsets and compatibility, to the point where there is no longer any reason to get intel over AMD. AMD has realized that since CPUs are usually doing many things at once, it is better to be broad than deep.

      Intel is going to have to do something really spectacular soon or continue to lose market share to AMD. Personally I hope they blow it, because I'm so much happier with Athlons than with any intel CPU. AMD's only black mark is the K6, which until the K6/3 has only 24 bit FPU, and as such has many compatibility problems. Of course, if you're running linux, you'll never see them, so the faster K6s are not useless yet. (Cobalt Raq3 owners rejoice.)

      --
      "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
    7. Re:So What ? by adrianbaugh · · Score: 4, Insightful

      We're supposed to be impressed by Intel's latest and greatest chip beating Alphas that aren't even produced anymore?
      I'm not wishing to knock Intel but it seems that these days whoever has the newest fabrication plant. Intel brings out a new line of chips: they're faster. So AMD brings out a new line of chips later on: bang! they're faster still. And so the merry dance goes on.
      Of course, this is all to the consumer's good as it means there's far more competition. But as far as the consumer is really concerned it doesn't matter so much who currently has the fastest chip as whose chip currently offers the best value while still being "fast enough". For my money that's been AMD for a while now.

      --
      "'I pass the test,' she said. 'I will diminish, and go into the West, and remain Galadriel.'"
      - JRR Tolkien.
    8. Re:So What ? by bhtooefr · · Score: 2, Informative

      There wasn't much difference on IPC, but AMD did make a 386DX/40, whereas Intel only made a 386DX/33. 8088 was identical IPC and clock (4.77MHz, Intel design, Intel and AMD build), but 80286 wasn't on clock (was on IPC) (6 to 25MHz, Intel design, Intel (6-12MHz), AMD (6-20MHz), Harris (6-25MHz) build).

    9. Re:So What ? by EmagGeek · · Score: 2, Insightful

      A brief history of microprocessor development:

      The company I work for invented the first 16-bit microprocessor EVER, the CP1600 (ok, to be fair, it was a joint effort between us and a partner company), which was released in late 1974, when Intel was a scant 6 years old and PC meant "Pissing Clear." Intel was still a long 4 years away from introducing the 8086, which was only an 8-bit CPU anyway.

      Nobody ever talks about the CP1600 because it was not oriented toward "personal" computers. After all, why the hell would anyone want their own computer? The CP1600 was designed and later integrated into Honeywell's TDC2000 distributed process control system, the very first distributed digital process control system.

      Chances are, the gas that is sitting in your car was refined using a TDC2000 or descendant control system, so the CP1600 lives on in all of us just about every day.

      Intel just got lucky with marketing, and it was the old consortium, LIM, that made the PC a reality. Those of you who were born before the 80's probably remember first hand what LIM was, but I'll leave it to exercise for you newbies to find out. You'll be amazed at who used to be bedfellows...

    10. Re:So What ? by PlazMan · · Score: 2, Interesting

      How about some whining from a real hardware designer?

      I used to work at Intel designing micros, and I can assure you that there are several highly-qualified and brilliant people in the microprocessor architecture and design teams. Unfortunately, Intel management directed them to trade performance for MHz about seven years ago and now they're finally paying for that foolishness. Lots of really good people have either left the company or drifted away from the project teams to the labs.

      Most of the people that I know who work or worked on the Prescott team say that it was probably the worst managed project ever at Intel. Take two (rival) divisions and tell them to work together, combine that with a design-by-committee mentality, and throw in a completely unreasonable schedule (imagine being in "crunch mode" for 2 years straight).

      Intel has succeeded in staying ahead by virtue of brute force. They have the resources to make diving save after diving save. The manufacturing and process engineers are unbelievably resourceful. The Northwood team has saved their bacon for the past two years as Prescott has missed deadline after deadline. It will be interesting to see if the behemoth can change its course and use its huge amount of engineering talent more efficiently in the future.

  7. Pipeline stalls by k4_pacific · · Score: 4, Interesting

    When the processor branches, all the partially executed instructions in the pipeline are lost.

    They could minimize this by creating two different conditional branch instructions for each condition. One for cases where the programmer expects the branch to occur most of the time, and one for where the branching rarely occurs. They could then optimize the pipeline behavior for each case. If its a 'likely branch' instruction, it could start fetching commands from the branch. If its an 'unlikely branch' instruction, it could prefetch the next instructions after the branch.

    This would work well in loops where every time but the last, the processor branches back to the top.

    --
    Unknown host pong.
    1. Re:Pipeline stalls by bmorris · · Score: 2, Interesting

      Read up on predication. http://www.geek.com/procspec/features/itanium They do some cool stuff with it in Itanium.

    2. Re:Pipeline stalls by qbwiz · · Score: 2, Informative

      This was already implemented on the PowerPC 601 and 603 (and possibly others, my book is getting rather old). Additionally, the Alpha 21064 and 21064a processors could optionally guess a branch as taken if it went back(loops), and not taken if it went forward(ifs).
      Most processors nowadays use dynamic prediction, basing current predictions upon whether earlier branches were taken or not taken. The branch unit on the P4 predicts with an accuracy of about 95%.
      One more interesting way of doing it is to try executing both paths at the same time, and throwing out the one that is incorrect. This requires a lot more logic (although pentium 4's already include "hyperthreading", and this is somewhat similar), and with such high accuracies probably would actually be much worse than the current way of executing.

      --
      Ewige Blumenkraft.
  8. It;'s not that it'll be slower... by Lothsahn · · Score: 5, Informative

    It'll most likely be slower per clock cycle.

    What this means, is that it will take a faster clock cycle (4GHZ, for instance) to do the same amount of processing as the Northwood core. However, increasing the pipeline should allow Intel engineers to achieve higher clock speeds, as the longest transistor path will likely be shorter (faster switching times).

    In essence, Intel is attempting to increase the speed of their CPU's by focusing on increasing the clock speed (P4), while AMD is focusing on increasing the amount of calculations per clock cycle (Hammer).

    Of course, there are a lot of more complex tradeoffs that factor in (ie. branch prediction). I highly recommend reading a computer architecture book if you're at all interested. It's really facinating stuff.

    --
    -=Lothsahn=-
    1. Re:It;'s not that it'll be slower... by edrugtrader · · Score: 5, Funny
      I highly recommend reading a computer architecture book if you're at all interested. It's really facinating stuff.


      dude, i don't even read the articles.
      --
      MARIJUANA, SHROOMS, X: ONLINE?! - E
    2. Re:It;'s not that it'll be slower... by philthedrill · · Score: 3, Interesting

      It'll most likely be slower per clock cycle.

      Yes, I agree. My guess is that they're trying to achieve higher absolute performance. What surprises me is that this is still considered a P4 core, since adding pipeline stages (even 1 stage) is a very non-trivial task.

      This'll also kill the benefits of reduced power consumption of 90 nm technology (increase in area from the additional pipeline registers, increase in frequency), which is important in server design. An argument about the benefits of having a trace cache is the reduction in power consumption since you can remove some decoders (x86 decoders are horribly complex, yet having enough to feed the rest of the processor is critical for high performance). The P4 only has one x86 decoder (plus the uROM) and is able to perform well in general.

      It'll be interesting to see the power consumption numbers (average and max) as well as the die size. Also, I wonder how AMD's CPU rating system will change as a result of this.

  9. Intel bit by their own tricks? by lambadomy · · Score: 4, Interesting

    Assume for a second that Intels P4 design was really meant to boost GHz numbers easily (to guarantee victory in the GHz war if not the performance war). If so is the Prescott design now due to having to keep up with themselves? Obviously they could design a chip that is "faster" but runs at a lower clock speed than the P4s, but they've pushed the GHz number so much that now they're kind of hamstrung in their design options.

    1. Re:Intel bit by their own tricks? by bhtooefr · · Score: 2, Informative

      Mobile Pentium 4: Cooled-down P4
      Pentium 4-M: Redesigned cooler yet P4
      Pentium M (Centrino): Redesigned Pentium III to take advantage of modern technology (400MHz bus, SSE2, etc.) and be cooler yet.
      Celeron M: Pentium M failure/economic bin. Half the cache.

  10. Re-read the article the reg is GUESSING 30 by uarch · · Score: 5, Informative

    Re-read the register article. Its not the Intel guy who said 30 stages, its the Register who is guessing. They're assuming that since it went from 10 to 20 before it'll go from 20 to 30 now. Its not likely to end up being more than a few extra stages.

  11. Slower than Northwood? by StarCat76 · · Score: 4, Interesting

    Although the Prescott core will have a longer pipeline, it will proboably end up performing a bit better clock-per-clock against Northwood. This is due to a couple reasons. Firsly, Prescoot has 1 MB on-die L2 cache. That's a good bit, and one could see how the P4 was helped by the 2M L3 cache in the P4 "EE". Secondly, the new P4 will have improved hyperthreading. It will also have somewhat improved branch prediction and implements PNI(Prescott New Instruction) which will require a recompile to help things out. All in all, I see the Prescott as being just as fast or faster per clock as Northwood, mostly due to the doubled L2 cache.

  12. Low-power consumption devices by johnthorensen · · Score: 4, Interesting

    So, since Prescott has approximately a 30 stage pipeline, I guess Intel has decided to continue to ignore the low-power consumption market, leaving it open to people like VIA and Transmeta. This is really disappointing to a lot of folks in the embedded markets, who would really like to see Intel ship something with significant horsepower that doesn't require a heatsink with the mass of a black hole to keep running.

    Word has it that VIA is readying a new x86 processor to their line that supposedly has P3-class FPU performance while maintaining the same levels of poser consumption as its predecessors. It is expected that this processor may actually have a big win in front of it for DirecTV boxes. With the extra CPU horsepower, it should be exciting to see what nifty features come out of this, especially considering most set-top CPUs generally just act as "traffic cops" for the data moving between ASICs. If they're really making the move to this class of processor, perhaps they've got more in mind.

    --JT

    1. Re:Low-power consumption devices by Wesley+Felter · · Score: 3, Insightful

      Hello, Pentium M?

    2. Re:Low-power consumption devices by Pyro226 · · Score: 2, Interesting

      ...would really like to see Intel ship something with significant horsepower that doesn't require a heatsink with the mass of a black hole to keep running. Aside from the whole Earth getting sucked into oblivion thing, a black hole would make an excelent heat sink. I mean, not even light can escape its gravity - heat wouldn't stand a chance.

      --
      This message is encrypted with Quad ROT-13 to protect the author's copyright under the DMCA.
    3. Re:Low-power consumption devices by ottffssent · · Score: 2, Funny

      > ...the same levels of poser consumption...

      Think what that would do for the world! Poser-powered PCs? They'd absolutely *FLY* off the shelves. e=mc^2 says I could stop worrying about the electric bills and heat he house with computers. One poser a decade would more than do it.

      Utility computing my arse! What we really want is computing *without* using utilities, and this is it, folks, the real deal. Buy your poserPC today! ;)

  13. compilers by Mieckowski · · Score: 4, Informative

    I suppose that this makes having a good compiler a little more important. Compiling the same program for a G4 on a compiler other than GCC gave me a 100% speed boost. I don't know if branch mis-prediction came into play, but it had a conditional in its inner loop (it displayed the mandelbrot set).

  14. Sounds Like Marketing by Anonymous Coward · · Score: 4, Interesting

    It sounds like Intel has totally given up on efficiency, and has the Marketing department doing processor requirements now... (has to clock to xGHZ!)

    I've been working with Dual Opterons for a few months now, and have been very impressed as to their speed, heat dissapation, and bang for the buck.

    A large data transformation job (really doing a scrape of a mainframe report for data) on the order of 1.1GB processed much faster on an IBM E325 Dual Opteron 2.0ghz running 32bit Windows (ack) than my Dual 2.4ghz Xeon (w/HT) running Windows (double ack)....

    Yeah- it's not a benchmark, but it is real world performance.

    1. Re:Sounds Like Marketing by ProtonMotiveForce · · Score: 2, Insightful

      I'm confused. How is it marketing only when you produce a faster chip?

      That's like saying a gold medal winner in the olympics only ran for the monetary value of the gold they receive. It's actually quite freaking stupid.

      The chip is faster. There are many ways to get faster chips that generally boil down to high IPC or high clock. Why do you nimwits insist on bleating that if you go the high clock route you're only catering to marketing?

  15. Prescott vs. Northwood - Insides exosed by metlin · · Score: 2, Informative

    I had found an interesting article exposing the innards of the 775 pin Prescott -- see it here

    (Credit: Got it off The Register from this article)

  16. Myth? by The+Bungi · · Score: 5, Funny
    Alizarin Erythrosin writes "Further contributing to the MHz Myth ...

    Let me guess - 'Alizarin Erythrosin' is Cupertinus Elvish for 'Mac User', right?

  17. ummm... by circletimessquare · · Score: 2, Funny

    As most of us know, a longer pipeline can lead to slowdowns in the form of branch mispredictions and pipeline stalls.

    no, i didn't know that

    --
    intellectual property law is philosophically incoherent. it is your moral duty to ignore it or sabotage it
    1. Re:ummm... by glwtta · · Score: 4, Funny

      Are you most of us?

      --
      sic transit gloria mundi
    2. Re:ummm... by addaon · · Score: 2, Funny

      I are.

      --

      I've had this sig for three days.
  18. Pipelines != Math Performance by TubeSteak · · Score: 3, Interesting
    My understanding was that AMD has 3 FPUs to Intel's 2. Oh, and AMD has 3 AGUs (integer units) compared to Intel's 2+2 (two of them also do other things). Anyways, most users, @ the Ghz speeds this proc is coming in at, will never notice the difference. For the people who care, they'll figure out what the proc can and cannot do... then use it accordlingy. Unless you guys really want to run windows, why not compare the Opteron to a Dually Mac? After all, the PowerPC is really good at number crunching.

    How come your computer takes seconds to multiply two 400 digit #s, but ages to factor them?

    --
    [Fuck Beta]
    o0t!
    1. Re:Pipelines != Math Performance by tomstdenis · · Score: 5, Interesting

      More specifically the Athlon has three ALU/IEU pipeline pairs, 1 FADD, 1 FMUL and 1 FLOAD pipeline [e.g. you can't do 3 FP muls at once].

      The decoder can send upto three instructions into the pipeline per cycle. Actually that's only for directpath instructions [e.g. simple ALU/FP]. Vector instructions stall all three decoders.

      The ALU scheduler is fairly strong but it does have several weaknesses. from the manual I can't see that it can resolve dependencies from other pipelines. For instance,

      ADD EAX,EBX [DIE ]
      ADD EBX,EAX [D IE ]
      ADD ECX,EBX [D IE] - critical path
      INC ESI [ DIE ]

      D == decode, I == issue, E == execute [pp.. 227 of the athlon opt manual].

      So the fourth instruction will always start on the second cycle despite the fact that ALU1/2 are blocked.

      Similarly the Athlon memory ports are a bit weak. There are read/write buffers but you still can only issue two reads or one write per cycle which is annoying.

      However, the strength of the Athlon ALU over the P4 ALU is that for the most part it can keep all three pipelines busy even if they are blocked at some stage [e.g. it can decode/issue even if blocked]. It doesn't say in the documentation but I could swear the Athlon can cross-pipe things too. Cuz sometimes I can mess the order of ops [e.g. create a dependecy] and it executes in the same time regardless.

      Anyways, yeah it's all about the 3 ALUs and a decent scheduler. Something the P4 does not have.

      Tom

      --
      Someday, I'll have a real sig.
    2. Re:Pipelines != Math Performance by bhtooefr · · Score: 2, Informative

      I would, but I could just get the PCWorld Athlon FX-51@2.2GHz (almost identical to an Opteron 148) vs. 2xOpteron 246 (2.0GHz) vs. Athlon 64 3200+ vs. P4 3.2 vs. 1.8 G5 vs 2x2.0 G5 benchmarks, and see that in all benchmarks except Photoshop (on the dual G5), Quake III on the A64 and O246 (probably the SMP), and Word on the O246, the x86 CPUs *MURDERED* the Macs. Yes, even the P4. BTW, the AMD CPUs did well against the P4, except in the Quake III and Word benchmarks (Intel optimized code, maybe - Q3 is definitely Intel-optimized, but WORD?)

    3. Re:Pipelines != Math Performance by Anonymous Coward · · Score: 3, Insightful

      Ok, so they benched Premiere 6, Photoshop 7, Microsoft Word, and Quake 3.

      Please tell me you have at least the 2 brain cells required to know that this benchmark is far from accurate.

      Anyone who does ANY form of editting on a Mac wont touch Premiere 6 with a 100-foot pole. Why? Because Final Cut Pro smashes it to little tiny pieces you could use to flavor your coffee.

      Microsoft Word? Tell me you're kidding. The benchmark was doing search-and-replaces. This is dependent on so many things ranging from hard disk caches to Microsoft's optimizations that its almost not funny.

      And Quake 3. Almost entirely dependent on the graphics card and the drivers written for it.

      Nothing to see here, move along.

      (yes, I know I shouldn't feed the trolls)

    4. Re:Pipelines != Math Performance by tomstdenis · · Score: 2, Interesting

      "Vector instructions stall all three decoders.

      Yup. E.g. splitting movps -> movlps+movhps does indeed make a performace gain."

      I meant VectorPath instructions like DIV, LGDT, etc... ;-)

      They stall all three decoders. As for alignment the trick is to pack as many instructions into 8-byte aligned windows. According to the manual it fetches 24-byte windows and performs one [or two I forget... PDF is so far away] of scan/early decoding.

      So the trick is to organize your code so that each 8-byte segment has as many directpath instructions in it. That will minimize the decode latency [depending on the instructions may minimize issue/execute latency].

      The problem though is most ALU opcodes are at least two bytes [except for things like INC/DEC] and worse yet things like

      00000000 89D8 mov eax,ebx
      00000002 8B00 mov eax,[eax]
      00000004 8B0418 mov eax,[eax+ebx]
      00000007 A100040000 mov eax,[0x400]
      0000000C 8B8000040000 mov eax,[eax+0x400]

      So really offsets/constants are horrible [the last two instructions are 5 and 6 bytes each].

      If you have to step through arrays I think the idea would be to use the middle, e.g.

      00000012 03040B add eax,[ebx+ecx]
      00000015 81C100040000 add ecx,0x400
      0000001B 03040B add eax,[ebx+ecx]
      0000001E 81C100040000 add ecx,0x400

      Which takes 18 bytes. [four windows]. Another trick is to use a register for the step size...

      00000024 BA00040000 mov edx,0x400
      00000029 03040B add eax,[ebx+ecx]
      0000002C 01D1 add ecx,edx
      0000002E 03040B add eax,[ebx+ecx]
      00000031 01D1 add ecx,edx

      [16 bytes, 3 windows, ignore stalls.... ;-)]

      Tom

      --
      Someday, I'll have a real sig.
  19. Doesn't matter to me... by TitusC3v5 · · Score: 4, Insightful

    ...since my next computer is going to house a G5.

    Personally I'm tired of trying to keep up with the gHz war between AMD and Intel. With our current technology, the only areas really pushing processing speeds are gaming and video/image applications(that I'm aware of). My grandmother doesn't need a P5 4gHz to check her email, and neither do I if I simply want to write a paper.

    --
    And the masses cried out, "09 F9 11 02 9D 74 E3 5B D8 41 56 C5 63 56 88 C0!"
  20. Scientific work on optimal pipeline depth by Wesley+Felter · · Score: 5, Informative

    In case anyone wants some hard facts:

    A. Hartstein and Thomas R. Puzak (IBM): The Optimum Pipeline Depth for a Microprocessor, ISCA 2002.

    M.S. Hrishikesh, Norman P. Jouppi, Keith I. Farkas, Doug Burger, Stephen W. Keckler, Premkishore Shivakumar (UT Austin, Compaq): The Optimal Logic Depth Per Pipeline Stage is 6 to 8 FO4 Inverter Delays, ISCA 2002.

    Eric Sprangle , Doug Carmean (Intel): Increasing Processor Performance by Implementing Deeper Pipelines, ISCA 2002.

    A. Hartstein and Thomas R. Puzak (IBM): Optimum Power/Performance Pipeline Depth, MICRO 2003.

    What all these papers have in common is that they find that increasing the pipeline depth past 20 stages increases performance.

    1. Re:Scientific work on optimal pipeline depth by -tji · · Score: 3, Interesting


      > What all these papers have in common is that they find that increasing the pipeline depth past 20 stages increases performance.

      Is that a typo, or am I misinterpreting the papers you liked above?

      In all but the Intel paper, it looked to me like they were saying the optimal pipeline depth was somewhere between 6 and 20 (depending on workload).

      In the introduction of the Intel paper, it says "Focusing on single stream performance". So, basically they are focusing on artificial benchmark performance.

    2. Re:Scientific work on optimal pipeline depth by bmoore · · Score: 2, Informative

      In addition to these, there is a paper coming out in the next ISPASS conference from some researchers at Notre Dame which looks at the effects of increasing the pipeline depth on the memory subsystem. It turns out that as you crank up the pipeline depth, you decrease the amount of "work" that can be done in a single cycle (obviously). The papers from ISCA fail to fully take the memory subsystem into consideration.

      Now, for the most part, Comp. Sci and Eng. majors assume L1-caches to have 1-cycle latenies. Most current "real" processors do NOT have 1-cycle latencies, because it takes too long to access a cache of any useful size. As the pipeline depth increases, it gets much more difficult to have large L1 (or L2) caches.

      Using the cache design simulator Cacti, we were able to get data on the approximate maximum sized cache, based off of pipeline depth (yes, this is fab-tech independant, check the paper for details). For example, if you consider a 5-cycle L1 delay (this is for a hit, not a miss), the maximum cache size you can get for a 10-stage is 512KB (as a 256K Instruction and 256K Data), for a 15-stage is 128, for a 20-stage is 32K, and a 25-stage would be 4K!

      We simulated up to a 50-stage pipeline (the Intel paper above claims that a 50-stage pipeline is best), and the fastest cache we could simulate at that speed takes 8 cycles to read from the L1. This is for a 4K cache! (2K instruction, 2K data).

      As anybody who has studied Computer Architecture before knows, caches need size to be effective. There are going to be some serious memory issues with these deeply-pipelined processors!

  21. Most of us know by scrote-ma-hote · · Score: 2, Funny
    As most of us know, a longer pipeline can lead to slowdowns in the form of branch mispredictions and pipeline stalls
    Yeah, um who here actually knew that. I'm struggling to believe it's anywhere near 1/2. I'm sure a poll would clear this up.
  22. Re:Why? by phorm · · Score: 4, Interesting

    Which basically means, Intel can release a CPU with a higher MHZ rating for those that fall for such things.

    In reality the CPU will be somewhat faster than current ones due to the higher clock, but much less efficient.

    Why not just dump MHZ as a rating altogether? Wouldn't FLOPS-based (Floating Operations Per Sec) or something similar be a better measurement? Maybe how far a simple program can compute PI in a second? We should really be looking at an operational-based measurement rather than a clock-based one.

  23. One-off number crunching... by Goonie · · Score: 3, Interesting
    In some situations, this kind of number-crunching is done with a custom program that is only run a few times. In such situations hacking something together in Matlab is quicker to get up and running than a full-blown C++ or, god forbid, FORTRAN program.

    Programmer time is much more expensive than faster machines.

    --

    Any sufficiently advanced technology is indistinguishable from a rigged demo
    --Andy Finkel (J. Klass?)
  24. 4-stage pipeline by mosb1000 · · Score: 3, Funny

    Gosh, I'm feeleing really left behind, my G4 400 only has 4 stages in it's plpeline. At least it's build on a .22 micron process as apposed to the Pentium's measly .13 micron process. Yes, that was a joak

  25. I doesn't take much experience to notice flaws. by qortra · · Score: 2, Insightful

    I've not helped to design an operating system or really any part of an operating system, but I can damn well tell you that Windows ME was a shitty OS. It doesn't take any experience for me to tell this; I can determine this by simple observation.

    When the tire of my car explodes in an open road, it would not take much expertise on my part to diagnose it as a problem with my tire (they really aren't supposed to explode). And, when it happens to many other people with the same tire, it wouldn't take any expertise on my part to determine that it is probably a flaw in that tire design.

    If indeed long pipelines make non-predictable/chaotic software cause more mispredicts, and I notice that those applications do indeed run more slowly (or fail to see a speed improvement) on a new, more expensive, Intel processor, then I can assume without expertise that the design of the processor is not fitting for those applications.

    Also, when Intel's experienced engineers make a design decision, it might not be with the purpose of speed. In fact, I think few decisions there are. Intel, like Microsoft, is a marketing company. They like big numbers because they attract customers. Customers don't necessarily want really fast matlab, they want to be able to say "4 Ghz" because it makes them feel special.

    So, please don't be frustrated with people for making simple, astute observations. Intel engineers (with over 30 years' experience) don't neccessarily have our best interests in mind.

  26. hmmm... by rebelcool · · Score: 2, Informative

    Generally one of the best processor architecture books out there is Computer Organization and Design. It does assume an amount of digital logic design (flipflops, clock, multiplexors and other basics) though it does have an appendix which briefly glosses over those. Honestly, to really "get" it you need an education in it.

    --

    -

    1. Re:hmmm... by geekee · · Score: 4, Informative

      Yes. Hennessy and Patterson (or in reverse, I have Stanford bias :-)) is the bible of computer architecture. They invented the RISC processor independently at Stanford and Berkeley. Their processors evolved into MIPS and SPARC.

      --
      Vote for Pedro
  27. Is this the right move? by Zebra_X · · Score: 4, Interesting

    Intel has shown no real interest in joining the 64-bit fray. Indeed, they don't have much choice. To release a 64/32-bit chip at this point would truly create an Itantic out of the Itanium. Microsoft would have more or less wasted it's time producing low volume products such as SQL Server 64 and XP 64 (different than XP 64-bit extended which is as yet to be released). Other consequences for such a shift in strategy would include, a number of people investing in the itanic platform who would be the proud owners of an all but useless, but very expensive hardware platform on their hands.

    Most real world tests point to AMD chips being faster. The Int and Floating Point Tests still belong to the P4 3.2, but the P4 is having to pass the 1st place troughy to AMD when it comes to games and office productivity.

    And then there is price. For $320 you can get $700 worth of Intel performance. Mind you this is the AMD64 running in 32-bit mode.

    It would appear that all that is really needed to justify mass market adoption is a consumer OS, that would be Windows XP 64-Bit extended. Currently in Beta. The only delay there is that the .NET framework is not 64-bit ready. We can probably expect it's release with VS.NET Whitby, a.k.a. .NET 2.0.

    After that - we just need to see some AMD adoption in the mainstream pc builders.

  28. Do you know what you're talking about ? by vlad_petric · · Score: 4, Interesting
    Matlab is mostly loops. Loops generate branches with high predictability, and as a consequence deep pipelineing won't incur much performance loss. Furthermore there's a lot of parallelism in those loops, and the out-of-order execution engine is quite good at exploiting it (i.e. hide the long latency of FP ops by overlapping them)

    It's much more likely the size of the L2 cache is affecting you (i.e. your working set does not fit into P4's L2 cache but it does in Barton's).

    If you don't believe me, try the demo version of Intel Vtune performance analizer on matlab running one of your programs.

    How well your caches perform is probably the most important thing for a processor today, as the speed of the main memory is a couple of orders of magnitude under the speed of the processor. It takes a couple of hundred cycles to service an L2 miss, while a long FP operation takes at most 20 cycles.

    --

    The Raven

  29. Re:Why? by Wanderer2 · · Score: 5, Interesting
    Why not just dump MHZ as a rating altogether?

    Didn't AMD try to organise this and recently concede it wasn't going to happen?

    As long as any metric favours one particular manufacturer, the rest will try to replace it with a new one. The result will be more FUD and ore confused users ("I've finally worked out what GHz are and you tell me I have to look at the number of flops?!?")

    </Pessimist>

    --
    I say we take-off and slashdot the site from orbit... it's the only way to be sure
  30. Effective pipeline by jmv · · Score: 3, Interesting

    I read somewhere that on the P4, when an instruction is already in the L1 cache, the pipeline gets shortened. That's because the L1 instruction cache stores pre-decoded instructions (micro-ops). This means that when the instruction is reached again, the decoding (and branch prediction?) steps are already done, shortening the pipeline. When the instruction is not in cache, there's already a big hit anyway. With that in mind, we'll need to see whether the extra pipeline stages in Prescott will still be there when the instruction is in the L1.

  31. what is "processor speed"? by rebelcool · · Score: 3, Informative
    Are you referring to "clock speed" perhaps? Clock speed is only one part of what determines performance, along with about a dozen other things I can think of.

    No processor, barring a complete architecture change (in which case its a different processor entirely) will double its performance simply by doubling the clock speed.

    It really depends on how you define performance too and what your software is doing. Doing heavy I/O? Processor has little to nothing to do with I/O - it just hands it off to the bus and I/O controllers to take care of and then does something else while waiting for the interrupt.

    --

    -

  32. Smart Business Move by m3j00 · · Score: 2, Insightful

    Intel is trying to move chips. One way to improve your sales is to drum up higher GHz for the uninformed masses. If you can do this while still producing competetive chips, you will outsell a similar performing chip that's runs 700MHz or so slower than yours.

  33. thats branch prediction... by rebelcool · · Score: 2, Informative
    and is now common. These days it usually works by maintaining a history table of past branch behavior. Generally if you've had alot of branches before, you're in a loop, and statistically are likely to stay in the loop.

    You can also go back and "fix" instructions to an extent (and not in all cases) while in the pipeline in case of incorrect branching. x86 sort of sucks for this though because of the variable length instructions.

    Alot of computer science is based on those kind of statistics. You see it in memory management as well. Most data structures are created and quickly destroyed. But those that aren't tend to stay around for a very long time and not point to quickly created and destroyed ones.

    --

    -

  34. yep by rebelcool · · Score: 4, Insightful
    MIPS is a nice architecture to learn. Clean and simple. Useful, too, if you get into game design (sony uses MIPS based chips in the playstations)

    Stay away from x86 if you're just starting out...

    --

    -

  35. Dilbert Marketing by stuffedmonkey · · Score: 2, Interesting

    This is the end result of engineering driven marketing... When you relentlessly try to make the chip with the "most megahertz', you lose focus. AMD and Apple/IBM have started to pull away in quality - in terms of actual work done per clock cycle. While it's true that the average Joe or PHB might not know any better - you can only continue on so long...

  36. Re:Why bother with x86... by Indy1 · · Score: 2, Insightful

    x86 is old and flawed, but it has such a base of o/s 's and apps for it that its not funny. Look at itanium. There's hardly any programs availible for it, and its hugely expensive. In order to jump to anything new, you need Uncle Bill to port windows to whatever your coming up with, and we all know how fast and effective M$ is at doing such a complicated task (i.e. not very fast or effective at all). Sure Linux and the bsd's can be ported without a huge amount of work, but a cpu manufacter cant survive without a windows base.

    --
    Lawyers, MBA's, RIAA? A jedi fears not these things!
  37. A note about pipeline stages by Anonymous Coward · · Score: 2, Interesting

    The reasons that Intel has for increasing the # of pipeline stages seems, to me, more for marketing than actual performance.

    By increasing the # of stages (say, to do less work per stage), they're able to minimize interconnect delay (among other things), and therefore bump up the processor speed.

    It doesn't mean they'll be able to do more -- in fact, they're doing less per stage, just at a faster rate. (Whereas I suspect the Athlons are doing more per stage, and that's why we're seeing 2GHz Athlons tying or beating 3.2GHz Pentiums.)

    Marketing-wise, it'll be a win for Intel. Performance-wise (due to pipeline stalls), these changes will demand that Intel keep bumping up chip performance or else lose out to AMD. Of course, we all know which of these two criteria are the most important to the bottom-line.

  38. Technical discussion by Rufus211 · · Score: 4, Informative

    For those into the technical side of this type of stuff and heck of a lot higher S/N ration, check out the Ace's Hardware forum. There's a large thread going on overthere taking about the rumors and what it would actually mean.

  39. Re:Why? by Sivar · · Score: 4, Interesting

    More clockspeed = more sales. 95% of computer users (or is it 94%, with recent improvements in public education) believe in the MHz Myth mentioned on the front page.
    The MHz myth is the belief that the OneTrue measure of CPU performance is clockspeed. A 2GHz CPU is twice as fast as a 1GHz CPU. A 4GHz CPU is twice as fast as a 2GHz CPU.

    While it may not seem common to many of us, if you speak with a large number of average people about computer performance, you will quickly want to kill yourself. Or them. Or both.

    This isn't the fault of the general public, as Intel's marketing machine takes advantage of this common belief. Intel Pentium IV processors are some of the highest clocked processors in the world, and they benefit from everyone that thinks this somehow matters.

    --
    Computer Science is no more about computers than astronomy is about telescopes. --E. W. Dijkstra
  40. what, are you an expert? by mrm677 · · Score: 4, Insightful

    "As most of us know, a longer pipeline can lead to slowdowns in the form of branch mispredictions and pipeline stalls

    Get off your high horse. Intel architects aren't dummies. Itanium benchmarks are starting to whoop some serious ass and the P4 and Athlon have been neck-and-neck for years. I'm sure Prescott will perform very well.

    I can get into all kinds of architecture speak as to why your simplistic notions of mispredictions and pipeline stalls might not be so terrible. Who knows? Maybe Intel will execute both paths of a branch? They've already got partial instruction replay to make squashes much less expensive. With deep speculation, a big instruction window, good bypassing capabilities, and effective non-blocking caches, "pipeline stalls" are not an issue due to branch mispredictions. The bigger issue is memory latency/bandwidth and Intel has always done well with that. A branch misprediction can be easily tolerated...an L2 cache miss can't.

  41. "As most of us know..." -- riiiight by nazgul000 · · Score: 2, Insightful

    "As most of us know, a longer pipeline can lead to slowdowns in the form of branch mispredictions and pipeline stalls."

    Sigh... most of the people I know cannot place the planets of the Solar System in their correct order. What a rarefied realm we inhabit here...

  42. Summary of article by utahjazz · · Score: 2, Funny

    A. Hartstein and Thomas R. Puzak (IBM): The Optimum Pipeline Depth for a Microprocessor [colorado.edu], ISCA 2002.

    Let me guess...42?

  43. Re:Silly intel by toddestan · · Score: 2, Informative

    AMD's higher end is a bit pricy, but that's to be expected. Intel can't compete in the mid range, and is getting totally killed on the low end. An Athlon XP 2200 is around $60. That's more expensive than the slowest P4 - the 1.4Ghz. It's even cheaper than the lowly Celeron 2.2Ghz. In that sense, Intel is way overpriced. By the way, Intel's latests chips run just as hot as their AMD counterparts. The days of the cool running PIII are over.

  44. Re:Why? by ProtonMotiveForce · · Score: 2, Insightful

    Out here in "Reality World", as I like to call it, it _does_ matter. You see - performance is performance, whether it comes via IPC or high clock speed.

    Until the Athlon64/Opterons AMD had no answer to the P4. They just couldn't quite keep up. And you people harped on the same thing "Ooh, it's a marketing gimmick!".

    You want a marketing gimmick? How about selling a 64-bit CPU to people who have like 512M of memory. There's your gimmick.

  45. Re:Why? by GerryGilmore · · Score: 2, Interesting

    Before you run off blaming the evil Marketing demons, let me ask you this.....what readily quantifiable measure would you use instead to compare systems for the broad range of users and applications - all other things being the same? (memory, disk, etc.)

    Imperfect a measure that it may be, it's a hell of a lot easier to relate to and compare than "how many FPS of Quake3 can I get?" or "how quickly can it compile the 2.6 kernel?"

  46. Matlab, Schmatlab, I want to write some code! by Latent+Heat · · Score: 4, Informative
    Matlab is to the academic-scientific-engineering world what Visual Basic is to the accounting-business-data processing world.

    Your EE or ME or ChemE full professor as a grad student could have written a FORTRAN program to compute some stuff and write output to a numeric text file or perhaps draw some plots using a subroutine library. You are probably thinking that anyone who can't sling together C programs using VI to draw graphics straight to X is a luser, but I am talking about pretty technically savy people who don't have time to spend on this stuff and who employ armies of Engineering majors from foreign lands who are not up on this stuff either.

    My own take is that if a particular numerical calculation can be easily programmed by some package, it must not be on the cutting edge of research because someone has already done it. Besides, if your software package is really deep, most of the effort goes into the architecture and the data flows and into graphics, and the RAD bit is only simplifying a tiny part of what you are spending your time. A high-power scientific data visualization is really a video game, and how many video games are implemented in Matlab?

    But what Perl is to text processing, Python is to collections, and VB is to slinging together a GUI, Matlab is to numerics (what used to be FORTRAN libraries) -- it may not have the best algorithms, but it has a lot of algorithms -- it has a semi-decent scripting language, and it has some facility with producing plots from your computations and other data.

    Now that's the thing -- if you are doing matrix operations or using some canned function (most likely C under the hood), Matlab is as fast as fast can be. The minute you start looping in Matlab, it is interpreted and the speeds are in the Python range.

    Before you knock it completely, it has very good integration with Java modules -- more seamless than with C modules. While Java may be pokey for its GUI, for tight numeric loops the JIT is almost as fast as C -- no joke, a person should consider writing numeric extensions to Matlab in Java of all things, especially on Windows where they tweaked up Java 1.4.2_03. And how many scripting languages (OK, Jython) have this level of Java integration?

    But as a scripting language, Matlab has its shortcomings. It started out as a matrix calculator and has had features grafted on in a hodge-podge Visual Basic 6.0 kind of way. In terms of its data type restrictions and fubar scoping rules and brain-dead object extensions, I don't think, as they say, it scales very well.

    My other peeve is that it is proprietary, and while Math Works is not Microsoft, I worry if engineering schools, emphasizing use of "commercial packages students will use in the real world when they graduate" (as opposed to professors dinking around with their homebrew software for use in instruction), are becoming trade schools shilling for the big software houses. I don't have a lot of experience with it, but in place of Matlab we should be using stuff like Python and the Python NumPy extension -- Open Source alternative, comparable performance, C extensions for speed, but much more Turing complete, consistent, and scalable.

    And where is Matlab 6.5 using Java internally? Try doing a Files Open to start editing a Matlab script (M-file) with the Matlab editor window. One potato, two potato, three potato, and the window comes up. Now what language has that kind of GUI lag, I wonder what it could be?

    1. Re:Matlab, Schmatlab, I want to write some code! by dasmegabyte · · Score: 2, Insightful

      The reason Java GUIs are pokey for the most part is that people have been SPOILED by OOP. If you create a New window everytime, then yet, it'll be slow, because Java has to basically learn how to make the window in the given OS, lay it out, and populate it, all before it can display it (as opposed to VB/.NET, which apply very sneaky, often exasperating hints on how to make windows).

      Really, the New window should be made once, the optimizations saved in the assembly cache, and the same window used to subsequent calls. Some of the faster, non-Sun VMs do this kind of thing whether you tell them to or not.

      --
      Hey freaks: now you're ju
    2. Re:Matlab, Schmatlab, I want to write some code! by biostatman · · Score: 2, Insightful

      My other peeve is that it is proprietary

      You should try R. Free as in beer + speech, high level scripting, can link in compiled low level code (C, FORTRAN, maybe even Java), good graphics output, good matrix handling, lots of 3rd party extensions (most GPL'd). Not good for symbolic mathematics, though. Used heavily in the statistical community and actively developed by some very smart people.

      --
      For the love of $DEITY, loose != not win!!!!!
    3. Re:Matlab, Schmatlab, I want to write some code! by Dr.+Zowie · · Score: 2, Insightful
      Unfortunately, Matlab is still a category killer for certain kinds of pipelining. But the various open-source data analysis languages are coming on strong. Perl Data Language, Numeric Python, Octave, R -- they're all worth a look, though at least the first three fit the IDL niche a little better than the MatLab one. I'm not as familiar with R as I probably should be.

      Unfortunately, all of 'em (including MatLab) suck if you're working with chunks of data that are bigger than your cache, because you end up pumping stuff out over the main bus.

  47. Re:Why? by Sivar · · Score: 4, Interesting

    " Out here in "Reality World", as I like to call it, it _does_ matter. You see - performance is performance, whether it comes via IPC or high clock speed."

    Yes, high clockspeed "speed demon" chips can and often do outperform high-IPC "braniac" chips. Whether the final performance of the fastest Pentium IVs ends up being as high or even higher than the fastest competitor does not change the fact that Intel has made no effort to dispel the MHz myth--and it IS a myth, and have in fact encouraged it.
    I said nothing of final performance figures. I was stating that the marketing gimmick is that MHz is an accurate measure of speed, which it is not--even between different revisions of Intel's own Pentium IV core, let alone in comparison to their competitors.

    "Until the Athlon64/Opterons AMD had no answer to the P4. They just couldn't quite keep up. And you people harped on the same thing "Ooh, it's a marketing gimmick!"."

    Athlons and Pentium IVs have been leapfrogging each-other for years. If you believe that 32-bit Athlons were never competitive with Pentium IVs, you are quite mistaken. I would be happy to help you research the issue.

    You want a marketing gimmick? How about selling a 64-bit CPU to people who have like 512M of memory. There's your gimmick.

    You may not be aware of this, but it is actually an intelligent idea to fix problems before they become problems.
    --LBA-48 was introduces before more than a tiny fraction of people had hard drives that were larger than the 128GB limit. Is it a marketing gimmick that LBA-48 supports multi-petabyte drives? (2^48-1 512 byte sectors).

    --Serial ATA, and even ATA100 were introduced long before any hard disk drive could possibly approach 100MB/sec sustained transfer rate. Even today's world's fastest hard drive, the Fujitsu MAS3735, cannot quite reach 80MB/sec. DId you know, however, that the same situation occurred with ATA66, ATA33, ATA16, etc.? Perhaps engineers should have waited until the performance barriers were making drive upgrades pointless before introducing faster means of communication? After all, "no hard drive could possibly even approach 33MB/sec" --1995.

    The same applies to 64-bit processors.
    The average Dell comes with what, 256MB RAM? Probably 512MB now? That is 1/8 of the "4 GB barrier" of 32-bit pointers. Actually, that barrier is either 1.5GB, 2GB, or 3GB depending on your operating system.
    Now, let's think: Have you ever seen the average amount of RAM in a system double? I seem to remember 4MB being "plenty" and 16MB being "wastefull and rediculous". I seem to remember 32MB being the standard, and anything over 128MB was an unwise waste of money.
    Do you think that maybe, possibly, that pattern might repeat? Perhaps--since it has happened every few years for decades--the average amount of RAM in a system might increase? Applications might want more than 4GB of address space? Quake 5 may require 6GB RAM minimum (16GB recommended)?

    In case you were not aware, the 64-bit mode of the Athlon64 provides real performance benefits, whether software cares about the extra address space or not. Many algorithms, particularly encryption, data management, HL math, high precision math, media en/decoding, and compression can make use of the larger register size.
    The fact that there are double the number of GPRs (that stands for "General Purpose Register" Ohhh, ahhh) and that the amount of data that one can fit into those GPRs has quadrupled, helps ALL software that is more than a 20-line assembly language experiment. Hell, even having 16GPRs (twice as many as previous x86 chips), the AMD64 architecture is still considered register-starved. Look at the PowerPC, the IA64, the AXP, the UltraSPARC, and just about any other mainstream high-performance processor architecture.
    You may want to look at the reviews from reputable publications showing substantial performance gains from 64-bit Opteron software, including software that could not care less if you have >4GB of memory. Hint: Tom's Hardware is not on that list.

    Is a 10%-30% performance boost a gimmick?

    --
    Computer Science is no more about computers than astronomy is about telescopes. --E. W. Dijkstra
  48. Re:Why? by Sivar · · Score: 3, Informative

    Before you run off blaming the evil Marketing demons, let me ask you this.....what readily quantifiable measure would you use instead to compare systems for the broad range of users and applications - all other things being the same? (memory, disk, etc.)

    Imperfect a measure that it may be, it's a hell of a lot easier to relate to and compare than "how many FPS of Quake3 can I get?" or "how quickly can it compile the 2.6 kernel?"


    That very question has long been a topic of heated debate. Years ago, AMD launched an initiative to create a nonbiased (so they say), general purpose universal benchmark. It never went anywhere as far as I know.
    Overall, Winbench 'XX is a good benchmark because it shows actual performance in real-world applications (albeit somewhat old ones). For games, the only reliable means of benchmarking is to test those individual games, or at least assume similar performance across many games that use the same game engine. The game industry is converging because of the extreme difficulty of developing truly sophisticated 3D graphics engines. I predict that within 5 years, there will be at most 3-5 major game engines used by 90% of high-budget games. A general benchmark of these 3-5 engines (or however many there turn out to be) could be used, either taking their average and giving an overall "gaming score", or predicting the performance of the many games based on each engine based on extensive benchmarking of a few titles using each.

    Server benchmarking is not an issue, because those involved in the tests often know what they are doing.

    As far as unix benchmarking, well, that is a major pain in the ass. That certainly does not mean that we should rely on clockspeed, or god forbid on BogoMIPS. A standard benchmark based on the compilation time of a certain version of BASh was proposed not too long ago. Because many Unix geeks are developers, this would not be a bad start. As for pure CPU tests, perhaps a mix of BZip2, large-scale encryption, and ... other things might be good. As with any benchmark, there are always caveats and special conditions involved. If one simply averages the scores of many benchmarks things happen such as one candidate doing rediculously well one one (possibly unimportant) part of the benchmark, thus throwing the average way out of kilter.

    Benchmarking is a science, an art, and a rather large pain in the ass.

    Your point is well taken though.

    --
    Computer Science is no more about computers than astronomy is about telescopes. --E. W. Dijkstra
  49. Re:Why? by pastafazou · · Score: 2, Interesting

    The problem in killing the myth is the dominance Intel has in the processor market. The average Joe is force fed "Intel inside" everywhere he looks, and the sales people in most stores don't bother to explain the differences between different architectures (or they just don't know). Intel has capitalized on this by pushing their architecture heavily towards higher clock speeds, at the cost of many other efficiencies. It's simply MHz & GHz that everyone mentions. AMD, IBM, Apple, Sun, Motorola etc should start pushing something else that can be realistically measured. Maybe someone can do the conversion from clock speeds and GigaFlops to horsepower and Torque? Start talking in powertool talk, and a huge chunk of the population will suddenly start to understand a bit better.

  50. More details on Intel's processor by rice_burners_suck · · Score: 5, Funny
    Intel today announced its new 1024-hexabit microprocessor architecture technology. Named the Quantium, Intel's new processor core boasts powerful new technologies which will enable governments to better manage the rights (or lack thereof) of their subjects.

    The Quantium has the following new features:

    • Intel (r) LightSpeed (tm) technology breaks the processing pipeline into 299,792,458 discreet steps. As there is no internal clock within the processor, all operations occur at the speed of light. Hence, one "cycle" represents the absolute cosmic measure unit of time and all operations occur in one cycle. While this will not increase the processor's performance--indeed, it will pale in comparison to that of the ancient 80286 processor of old folklore--the faster internal clock speed is expected to increase Intel's sales by 0.000001% within 180 quarters.
    • Intel (r) SingleAtom (tm) technology squeezes the entire processor into a single atom by modifying the universe at the M-theory level. Individual strings compose modified quarks and other subatomic structures, which combine to form a very heavy atom, one with approximately the same weight as 1 million protons. As the matter is extremely dense, the radioactive decay, combined with the gravity generated by itself causes the configuration of the subatomic particles to remain bonded at the subatomic level while realigning a nearly infinite number of times every second. This realignment constitutes the execution of instructions within the SingleAtom (tm) processor.
    • 893,378,665,113 new operations have been added since the previous model, bringing the new total to over 18 googleplexes of instructions. All SCO intellectual property can be programmed in a single instruction, increasing SCO revenues. Corporations will have to pay $799 per processor instruction executed, or face serious legal action.
    • RAM has been depreciated. 4 billion exabytes of internal general-use registers allow software to make more efficient data access, providing a more compelling Internet experience over a 28k modem connection.
  51. Lab to market lag time: 4 years by Anonymous Coward · · Score: 2, Interesting

    I knew they were up to something when this mail appeared on the linux-kernel mailing list in 2000. 4.3 GHz, indeed!

  52. More misinformation -- for "MHz Myth" fans by 0x0d0a · · Score: 3, Informative

    You have to remember that a garden variety PC is a very unpredictable environment. You have network packets coming in, mouse events, keyboard presses, USB chatter, DMA access, every event generates and interrupt that requires the processor to stop what it's doing, and start the pipeline over again.

    It's nothing personal, but articles like this one, as well as posts like this, drive me absolutely batty with the amount of incorrect ideas propagated. It's not that one particular person is misinformed -- it's just that the amount of generally bogus information is silly.

    First off, at some point, as far as I can tell, a bunch of people read Maximum PC or somesuch consumer "PC enthusiast" magazines, and read about "The Megahertz Myth". Maybe Ars Technica ran the story that started all this. Heck if I know. All that the original author was trying to do was point out that people shouldn't judge processors strictly by clock speed.

    Boy, did they ever create a monster. Somehow, a bunch of folks managed to get the idea that Intel was pulling this as some sort of PR job to deliberately trick people into buying their processors. For Chrissake, this is such an incredibly stupid idea. The OEMs have purchasers that know what they're buying. Not only are they not going to just sit down and look at benchmarks, they're going to have a bunch of test machines built when deciding what to go with. That and business considerations outweight any "MHz rating". The OEM market just plain doesn't care. The only people getting excited about the "MHz Myth" are the "PC enthusiasts", a tiny, tiny sliver of a group when it comes to dollar value. If the sort of "PC enthusiast
    riffraff really think that they constitute any kind of a significant market to Intel -- enough for Intel to *redesign their entire processor*, using a longer pipeline and higher clock rate, around getting them to purchase a computer, they are vastly overestimating their own importance in the universe.

    When Intel makes the decision about a new processor, it's a pretty safe bet that they don't run out and say "Gee, how would Joe Assmunch in Marketing like us to structure this thing?" They have many, many PhDs in chip and circuit design who have many competing ideas about what the best designs would be. They run many, many simulations before even thinking about deciding on major design decisions.

    The "PC enthusiast" folks who think that Intel has taken this path to trick those people that buy from Dell, and that, ho ho ho, *they* are smart enough to see through the trick are ridiculous. If Intel wanted a high clock rate to put on stickers, they could jack the thing through the sky, run at 10GHz, then demux data and only accept data at a lower rate into the various units. Some of the units would move to even more instructions per cycle.

    The *current* poster is talking about *keyboard* and *mouse* events? "USB chatter"? Those don't even show up on the *radar*. You roll that mouse, send your 200 Hz interrupts, and you worry about 200 measly mispredictions per second? Just blowing away the page table cache during process switches (which runs at 100 Hz on Linux 2.4 x86 by default) already dwarfs any misprediction performance hit from the said devices, and folks frequently bump it up by an order of magnitude or so and don't see any measurable performance hit -- on Pentium IIs.

    As for DMA, the entire point of DMA is so that the processor *isn't* running code from the host. It can continue on in its own happy little world while a co-processor pokes at the memory bus.

    You might see significant branch misprediction issues with an inner loop with a branch statement that flicks back and forth just about every loop or so to screw over the branch caching. And "significant" is still pretty minor. The compilers hint to the CPU whether a branch is likely to be taken...it's not as if there's this massive, awful mistake that all the chip designers in the world are making that Joe I-Built-My-Own-Computer-

    1. Re:More misinformation -- for "MHz Myth" fans by 0x0d0a · · Score: 2, Informative

      Errata for the above -- "Some of the units would move to even more instructions per cycle." should be "Some of the units would move to even more cycles per instruction."

  53. Re:Silly intel by Hoser+McMoose · · Score: 3, Insightful

    WTF? Please, just have a look at some IA-64 assembly code! It's NOT pretty, especially if you want it to go fast. You've got to do the whole explicitly parallel thing, manually pack together independent instruction according to what pipelines you want to run them in.



    Itanium is NOT a RISC machine like Sparc, not in the least. Sparc is much more closely related to x86 than it is to IA-64. The Itanium is a VLIW chip, or EPIC in Intel-speak. It's a whole different animal altogether.



    FWIW, here's a brief article where Intel talks about implementing a bubble-sort in IA-64 assembly vs. the original C. In particular, they start with the code that the Intel C compiler generates and optimizes it. Their final, optimized version of the algorithm is on page 5, and it's anything but easy.


  54. Re:Do you? by gr8_phk · · Score: 2, Insightful
    Thanks for the techno-babble. This guys company obviously looked at real world performance. Their understanding of the cause may or may not be correct, but their conclusion (switch to AMD) is correct for them because they compared using the application that matters to them.

  55. Nice plug how about.... by gr8_phk · · Score: 2, Informative
    That's a nice plug for Matlab. Since plugs are not being modded off-topic today :-) Let me say that I know several people who use GNU Octave instead of Matlab. It does most the same things, and its free software. Some just for home use, and some working at small companies that couldn't afford Matlab. You can write code that works on both, so one guy uses Matlab at work and can run the same stuff on Octave at home.