Intel to Increase Stages in Prescott
Alizarin Erythrosin writes "Further contributing to the MHz Myth, The Register and ZDNet are reporting that the new P4 core, codenamed Prescott, will have a longer pipeline then Northwood. No official numbers have been released, but The Reg is saying an Intel spokesman said that 30 stages seems to be a reasonable estimate. As most of us know, a longer pipeline can lead to slowdowns in the form of branch mispredictions and pipeline stalls. 'And just as the PIII proved faster than the early P4s in some applications, it's likely that Northwood will similarly prove faster than Prescott, which has clearly been designed for speeds of the order of 4GHz.'"
With all these pipelines you'd think intel was Bush and Prescott was Afghanistan.
--
WHO ATE MY BREAKFAST PANTS?
Northwood was really unsatisfying. I found that for the money, it was too short with too few stages. While gameplay was fine, the lack of stages simply made the cost not worth it for me.
2 stars.
I have been pwned because my
Longer pipeline + equal number of errors == more data need to be redone == slower!
MOUNT TAPE U1439 ON B3, NO RING
It's not the size of your pipeline that counts... its how you use it.
Isn't that what caused the Great Chicago Fire of 1871?
I work at an engineering firm. The deep pipelines in the current P4 perform so poorly with general number crunching (e.g. matlab) we have almost completely switched to Athlons and are seriously considering Opteron.
-ghostis
Computer Science is all about trying to find the right wrench to bang in the right screw. -T.Cumbo?
I suspect AMD and even Apple are going to shrink Intel's bragging rights in that same time frame unless Intel gets their act together. From AMD's recent earnings report it sure seems somebody is buying Athlon 64's.
Intel blew it when they made the decision to let 32 bits ride for another 2 to 3 years. They look like old fuddy-duddys now. It's AMD and Apple via IBM thats has the cool shit.
I'm kind of tired of the perpetual whining of armchair hardware designers. So the happy few, highly paid architects, 30 years-experience in the industry, hundred-published scientific papers at Intel decide that the next gen chip will have more stages and they have to be called morons ? How do you know better ? Hasn't intel produced the fastest chips on the market with each and every micro-architectural generation ? Long pipelines = costly branch mispredicts, whoooaah, you're so bright why don't YOU have the job leading the prescott team ? branches can be predicted. Long pipelines can improve throughput. Microprocessors are all about trade-offs. Let the pros do the work and go back playing Quake.
Just because the early northwoods proved to be slower than the PIIIs doesnt mean the Prescott will be slower than the Northwood. Intel may very well have devised a way to yield better branch predictions or something of the sort. I definately won't buy one right away, but I wouldn't be surprised if they dont have the problems of the earlier Northwoods
It seems like intel is continuing to make poor decisions. Their Itanium processor is lame, the p4 is rediculously overpriced, and now they're planning on making more expensive yet less powerful processors. I personally suspect intel as a 80% marketing, 20% product sort of company. I think i'll stick to AMD for now.
Hi there
When the processor branches, all the partially executed instructions in the pipeline are lost.
They could minimize this by creating two different conditional branch instructions for each condition. One for cases where the programmer expects the branch to occur most of the time, and one for where the branching rarely occurs. They could then optimize the pipeline behavior for each case. If its a 'likely branch' instruction, it could start fetching commands from the branch. If its an 'unlikely branch' instruction, it could prefetch the next instructions after the branch.
This would work well in loops where every time but the last, the processor branches back to the top.
Unknown host pong.
It'll most likely be slower per clock cycle.
What this means, is that it will take a faster clock cycle (4GHZ, for instance) to do the same amount of processing as the Northwood core. However, increasing the pipeline should allow Intel engineers to achieve higher clock speeds, as the longest transistor path will likely be shorter (faster switching times).
In essence, Intel is attempting to increase the speed of their CPU's by focusing on increasing the clock speed (P4), while AMD is focusing on increasing the amount of calculations per clock cycle (Hammer).
Of course, there are a lot of more complex tradeoffs that factor in (ie. branch prediction). I highly recommend reading a computer architecture book if you're at all interested. It's really facinating stuff.
-=Lothsahn=-
Assume for a second that Intels P4 design was really meant to boost GHz numbers easily (to guarantee victory in the GHz war if not the performance war). If so is the Prescott design now due to having to keep up with themselves? Obviously they could design a chip that is "faster" but runs at a lower clock speed than the P4s, but they've pushed the GHz number so much that now they're kind of hamstrung in their design options.
The Prescott will be faster than the Northwood clock for clock. The choice to make the CPU with a longer pipeline was obviously an engineering choice and not a marketing one.
Re-read the register article. Its not the Intel guy who said 30 stages, its the Register who is guessing. They're assuming that since it went from 10 to 20 before it'll go from 20 to 30 now. Its not likely to end up being more than a few extra stages.
Although the Prescott core will have a longer pipeline, it will proboably end up performing a bit better clock-per-clock against Northwood. This is due to a couple reasons. Firsly, Prescoot has 1 MB on-die L2 cache. That's a good bit, and one could see how the P4 was helped by the 2M L3 cache in the P4 "EE". Secondly, the new P4 will have improved hyperthreading. It will also have somewhat improved branch prediction and implements PNI(Prescott New Instruction) which will require a recompile to help things out. All in all, I see the Prescott as being just as fast or faster per clock as Northwood, mostly due to the doubled L2 cache.
So, since Prescott has approximately a 30 stage pipeline, I guess Intel has decided to continue to ignore the low-power consumption market, leaving it open to people like VIA and Transmeta. This is really disappointing to a lot of folks in the embedded markets, who would really like to see Intel ship something with significant horsepower that doesn't require a heatsink with the mass of a black hole to keep running.
Word has it that VIA is readying a new x86 processor to their line that supposedly has P3-class FPU performance while maintaining the same levels of poser consumption as its predecessors. It is expected that this processor may actually have a big win in front of it for DirecTV boxes. With the extra CPU horsepower, it should be exciting to see what nifty features come out of this, especially considering most set-top CPUs generally just act as "traffic cops" for the data moving between ASICs. If they're really making the move to this class of processor, perhaps they've got more in mind.
--JT
I suppose that this makes having a good compiler a little more important. Compiling the same program for a G4 on a compiler other than GCC gave me a 100% speed boost. I don't know if branch mis-prediction came into play, but it had a conditional in its inner loop (it displayed the mandelbrot set).
It sounds like Intel has totally given up on efficiency, and has the Marketing department doing processor requirements now... (has to clock to xGHZ!)
I've been working with Dual Opterons for a few months now, and have been very impressed as to their speed, heat dissapation, and bang for the buck.
A large data transformation job (really doing a scrape of a mainframe report for data) on the order of 1.1GB processed much faster on an IBM E325 Dual Opteron 2.0ghz running 32bit Windows (ack) than my Dual 2.4ghz Xeon (w/HT) running Windows (double ack)....
Yeah- it's not a benchmark, but it is real world performance.
I had found an interesting article exposing the innards of the 775 pin Prescott -- see it here
(Credit: Got it off The Register from this article)
Let me guess - 'Alizarin Erythrosin' is Cupertinus Elvish for 'Mac User', right?
As most of us know, a longer pipeline can lead to slowdowns in the form of branch mispredictions and pipeline stalls.
no, i didn't know that
intellectual property law is philosophically incoherent. it is your moral duty to ignore it or sabotage it
How come your computer takes seconds to multiply two 400 digit #s, but ages to factor them?
[Fuck Beta]
o0t!
will have a longer pipeline then Northwood
it should be <b>than</b>, no then...fix it people
...since my next computer is going to house a G5.
Personally I'm tired of trying to keep up with the gHz war between AMD and Intel. With our current technology, the only areas really pushing processing speeds are gaming and video/image applications(that I'm aware of). My grandmother doesn't need a P5 4gHz to check her email, and neither do I if I simply want to write a paper.
And the masses cried out, "09 F9 11 02 9D 74 E3 5B D8 41 56 C5 63 56 88 C0!"
A friend of mines husband works for Intel. In fact, he was in the FPU division last time I checked.
This man I wouldn't sign on to design me a doghouse!
His checkbook was a horrid mess, he got basic math wrong.
Yet, he is desiging critical areas of Intel's high-end chips?
Karma Whoring for Fun and Profit.
yep, that would be _way_ less. of course Iraq *needs* democrazy so it's for their own good! after all, it's what we pay taxes for...
CB
free ipod and free gmail!
It's the difference between killing during war and a mafia hit.
In case anyone wants some hard facts:
A. Hartstein and Thomas R. Puzak (IBM): The Optimum Pipeline Depth for a Microprocessor, ISCA 2002.
M.S. Hrishikesh, Norman P. Jouppi, Keith I. Farkas, Doug Burger, Stephen W. Keckler, Premkishore Shivakumar (UT Austin, Compaq): The Optimal Logic Depth Per Pipeline Stage is 6 to 8 FO4 Inverter Delays, ISCA 2002.
Eric Sprangle , Doug Carmean (Intel): Increasing Processor Performance by Implementing Deeper Pipelines, ISCA 2002.
A. Hartstein and Thomas R. Puzak (IBM): Optimum Power/Performance Pipeline Depth, MICRO 2003.
What all these papers have in common is that they find that increasing the pipeline depth past 20 stages increases performance.
Read "Understanding Pipelining and Superscalar Execution" http://slashdot.org/articles/02/12/19/1810214.shtm l?tid=137 . Extended pipelines _can_ improve performance. However, the compiler _needs_ to understand how to take advantage of it. Otherwise, you could end up with slower code.
oh yeah, a site that talks about COLUMBO visiting Clinton? Yep, that does seem like a relavant site! But it's on the internet, so it must be true!
Please go back to watching Fox news now, catch ya later!
CB
free ipod and free gmail!
The Intel Fanboy Handbook. It's similar to The Republican Scumbags Handbook(called The Republican Handbook for short).
Programmer time is much more expensive than faster machines.
Any sufficiently advanced technology is indistinguishable from a rigged demo
--Andy Finkel (J. Klass?)
Well, like... Duh!
Geez! Who doesn't know that?!?! Why even mention it.
I've made up my mind and now I've got to lie in it.
Gosh, I'm feeleing really left behind, my G4 400 only has 4 stages in it's plpeline. At least it's build on a .22 micron process as apposed to the Pentium's measly .13 micron process.
Yes, that was a joak
Thanks for the references. At least it'll look like I'm working, though I'll still be procrastinating.
(\(\
(^.^)
(")")
*beware the cute-bunny virus
I've not helped to design an operating system or really any part of an operating system, but I can damn well tell you that Windows ME was a shitty OS. It doesn't take any experience for me to tell this; I can determine this by simple observation.
When the tire of my car explodes in an open road, it would not take much expertise on my part to diagnose it as a problem with my tire (they really aren't supposed to explode). And, when it happens to many other people with the same tire, it wouldn't take any expertise on my part to determine that it is probably a flaw in that tire design.
If indeed long pipelines make non-predictable/chaotic software cause more mispredicts, and I notice that those applications do indeed run more slowly (or fail to see a speed improvement) on a new, more expensive, Intel processor, then I can assume without expertise that the design of the processor is not fitting for those applications.
Also, when Intel's experienced engineers make a design decision, it might not be with the purpose of speed. In fact, I think few decisions there are. Intel, like Microsoft, is a marketing company. They like big numbers because they attract customers. Customers don't necessarily want really fast matlab, they want to be able to say "4 Ghz" because it makes them feel special.
So, please don't be frustrated with people for making simple, astute observations. Intel engineers (with over 30 years' experience) don't neccessarily have our best interests in mind.
Generally one of the best processor architecture books out there is Computer Organization and Design. It does assume an amount of digital logic design (flipflops, clock, multiplexors and other basics) though it does have an appendix which briefly glosses over those. Honestly, to really "get" it you need an education in it.
-
Intel has shown no real interest in joining the 64-bit fray. Indeed, they don't have much choice. To release a 64/32-bit chip at this point would truly create an Itantic out of the Itanium. Microsoft would have more or less wasted it's time producing low volume products such as SQL Server 64 and XP 64 (different than XP 64-bit extended which is as yet to be released). Other consequences for such a shift in strategy would include, a number of people investing in the itanic platform who would be the proud owners of an all but useless, but very expensive hardware platform on their hands.
.NET framework is not 64-bit ready. We can probably expect it's release with VS.NET Whitby, a.k.a. .NET 2.0.
Most real world tests point to AMD chips being faster. The Int and Floating Point Tests still belong to the P4 3.2, but the P4 is having to pass the 1st place troughy to AMD when it comes to games and office productivity.
And then there is price. For $320 you can get $700 worth of Intel performance. Mind you this is the AMD64 running in 32-bit mode.
It would appear that all that is really needed to justify mass market adoption is a consumer OS, that would be Windows XP 64-Bit extended. Currently in Beta. The only delay there is that the
After that - we just need to see some AMD adoption in the mainstream pc builders.
It's much more likely the size of the L2 cache is affecting you (i.e. your working set does not fit into P4's L2 cache but it does in Barton's).
If you don't believe me, try the demo version of Intel Vtune performance analizer on matlab running one of your programs.
How well your caches perform is probably the most important thing for a processor today, as the speed of the main memory is a couple of orders of magnitude under the speed of the processor. It takes a couple of hundred cycles to service an L2 miss, while a long FP operation takes at most 20 cycles.
The Raven
It's not the length of your pipeline; it's the thickness.
"They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety."
I read somewhere that on the P4, when an instruction is already in the L1 cache, the pipeline gets shortened. That's because the L1 instruction cache stores pre-decoded instructions (micro-ops). This means that when the instruction is reached again, the decoding (and branch prediction?) steps are already done, shortening the pipeline. When the instruction is not in cache, there's already a big hit anyway. With that in mind, we'll need to see whether the extra pipeline stages in Prescott will still be there when the instruction is in the L1.
Opus: the Swiss army knife of audio codec
No processor, barring a complete architecture change (in which case its a different processor entirely) will double its performance simply by doubling the clock speed.
It really depends on how you define performance too and what your software is doing. Doing heavy I/O? Processor has little to nothing to do with I/O - it just hands it off to the bus and I/O controllers to take care of and then does something else while waiting for the interrupt.
-
Even if we know that this could end up making the processor slower, it doesn't really matter because Joe Idiot sees 4.0Ghz and thinks "Boy, this is the fastest thing out there". I have a friend just like that, who is somewhat more into computers than most but still not totally involved, back when the Athlon-XP came out, and the processor faster than the P4 a couple of hundred megahertz above the XP's clockrate, and the benchmarks proved this as well as the real world use. Well, he didn't believe me, he thinks that the clockrate is everything, bigger is better to him. He probably still thinks this, but I don't argue because it's a waste of my time. I doubt many of us will be able to change other people's minds about it either. Let's face it most of the people in the world buy form Dell, IBM, Gateway, etc, and that's what their selling, so that's what people are getting.
The only thing really holding AMD64 processors back is lack of the supported 64-bit Windows edition, sure for those of us that run Linux/*BSD it's greats, but since the majority runs Windows, sales won't increase dramatically until XP 64-bit comes out, so people will be buying Intel until then.
Yeah, we all know that Q3 and MicroSoft Word are the best methods of testing platform-independent CPU (note: not GPU or GPU driver) performance... You should really lay off the crack pipe. For someone who wants to know how number crunching compares on either platforms, Q3 and Word (and Photoshop) aren't going to tell them squat about how something like MatLAB will perform. Q3 is mostly going to tell you the state of GPU tech and GPU drivers than integer ops, and MS Word is obviously going to be better (supported) on MS Windows than on Apple anything. Photoshop is only relevant to people who work a lot with Photoshop, like desktop publishers. That PCWorld benchmark is the most worthless piece of garbage that somehow gets linked to each time people bring up performance comparisons between x86 and PPC, even though it has no bearing on the performance of processes being discussed.
First paragraph of linked story contradicts your whole argument:
SANTA CLARA, CA -- NVIDIA Corporation (Nasdaq: NVDA), the worldwide leader in visual processing solutions, yesterday announced that NASA is using its technology to reconstruct Martian terrain from transmitted rover data in photorealistic virtual reality under the Linux operating system, allowing scientists to explore Mars in 3D as if they were actually moving freely on the planet's surface.
it's not controlling the rover, it's for reconstructing data sent back.
Intel is trying to move chips. One way to improve your sales is to drum up higher GHz for the uninformed masses. If you can do this while still producing competetive chips, you will outsell a similar performing chip that's runs 700MHz or so slower than yours.
and yet memory isn't decreasing in latency of access @ the same rate. Intel should consider adding a pair of independent memory controllers to the CPU, or at the very least, to the MCH. and with the move to 64-bit pointers and math, doubling the cache line size to 128bytes.
AMD vs. Intel:
Barton AXP: 75W (power-cutoff overheat protection)
Northwood P4: 90W (underclock overheat protection)
Clawhammer A64: 85W ("Cool & Quiet" - underclock)
Prescott P4: 100+W (underclock)
I think AMD's heat management isn't the problem.
You can also go back and "fix" instructions to an extent (and not in all cases) while in the pipeline in case of incorrect branching. x86 sort of sucks for this though because of the variable length instructions.
Alot of computer science is based on those kind of statistics. You see it in memory management as well. Most data structures are created and quickly destroyed. But those that aren't tend to stay around for a very long time and not point to quickly created and destroyed ones.
-
It is reasonable to assume that pipeline length is influenced by marketing. More MHz mean a competitive advantage, even if they yield no extra CPU power. Athlon 64 shows that a 2.2Gz 12 stage pipeline can be equivalent to the 3.2 GHz 20 stage pipeline of a P4 (the AMD model naming then tries to hide the lower MHz numbers).
This demonstrates that performance can be achieved in different ways, and intel went for "speed demon" - throw MHz at the problem. There are good reasons for this, but for example the "brainiacs" with lower MHz usually have lower power consumption. Maybe a more brainiac approach was opposed by marketing? They could never have pushed through a clearly inferior solution though.
Finally, I don't think that the Prescott pipeline will be quite as long as 30 stages, but time will tell. The architectural differences between Prescott and Northwood (the current P4) will be a lot smaller than those between Northwood and the P3. Therefore I expect that the number of apps that will run slower on a Prescott (which, of course, needs a somewhat higher clock frequency) will be very small. So move along guys, there's nothing to see.
Stay away from x86 if you're just starting out...
-
I thought his tone was perfect.
"Eve of Destruction", it's not just for old hippies anymore...
They could have gone with... VxWorks
And they did, dipshit.
AMD's only black mark is the K6, which until the K6/3 has only 24 bit FPU, and as such has many compatibility problems. Of course, if you're running linux, you'll never see them, so the faster K6s are not useless yet. (Cobalt Raq3 owners rejoice.)
I've got a bunch of old K6s at home: 450s overclocked to 500 in Asus TX97-LE's [Intel TX chipset] and laptop 550's in Epox MVP-3G5 motherboards [Via Apollo MVP-3 chipset]. Care to describe which K6s in combination with which software packages have "compatibility problems"?
At the moment, I'm most interested in LabView 7.0 and Java 1.4.2 running on Windows 2000 servers, although I may soon upgrade to Visual Studio .NET on Windows 2003 [still in conjunction with LabView 7.0].
I'm doing some pretty heavy duty math, and I better know if I should expect some lousy round off error [or whatever].
Anyway, thanks for any insight you can provide!
I'd eat hot grits for a chip that has "poser consumption."
I do not understand Intel nor other companies that do not try to develop anything besides x86. Let's face it, the architecture is flawed at its root. There are several issues that have been there from its very beginning(that is a topic for another forum) and instead of coming up with something new, Intel tries to patch its products with more crap.
When Apple realized that MOS Technolgies' clone of 6800 was not the best solution, the architecture was replaced with a new one that better suited Apple's goals. Why can't Intel retire x86 and move up to something new. If you think about, they can make a good chunck of money by coming up with a processor that can put AMD and Apple next to nothing. If they want to compete with this CPU, they better have good branch prediction.
This is the end result of engineering driven marketing... When you relentlessly try to make the chip with the "most megahertz', you lose focus. AMD and Apple/IBM have started to pull away in quality - in terms of actual work done per clock cycle. While it's true that the average Joe or PHB might not know any better - you can only continue on so long...
"Further contributing to the MHz Myth, The Register and ZDNet are reporting that the new P4 core, codenamed Prescott, will have a longer pipeline then Northwood."
Should be "than"
Fair enough, I appreciate your candor.
CB
free ipod and free gmail!
"Branch likely" has been done - and deprecated - e.g. by MIPS.
You underestimate how much effort goes into branch prediction already. Modern predictors even notice that a certain branch is taken exactly every 4th time. Any hinting that can be provided at compile time - especially if it's done without profile feedback - would not add much value to this.
Er, the topic is about Intel, I honestly think that there are more important issues we should be addressing. My posts here are simply a small ripple in the ocean that is /. with all of it's goatse.cx, gnnas and fps post.
still, I appreciate your candor, and will drop the commentary for now.
CB
free ipod and free gmail!
--Wow, I can't believe this got modded as 'Insightful'. 3000+ is a performance rating that is designed to show the CPU performs equivalently to a P4-3Ghz.
If you look at some actual benchmarks, you will see that the P4 3.06 is actually better in some cases than an AthlonXP3000+ (note this is the 2.167Ghz Barton in the graph)
SpecFP
SpecInt
Additionally, the data shows that a 3Ghz P4 is in fact MORE than 3x faster at SpecFP than a 1Ghz P3. Perhaps you should inform yourself a little before posting FUD.
Intel getting closer and closer to registering every single gate
----
Go canucks, habs, and sens!
The reasons that Intel has for increasing the # of pipeline stages seems, to me, more for marketing than actual performance.
By increasing the # of stages (say, to do less work per stage), they're able to minimize interconnect delay (among other things), and therefore bump up the processor speed.
It doesn't mean they'll be able to do more -- in fact, they're doing less per stage, just at a faster rate. (Whereas I suspect the Athlons are doing more per stage, and that's why we're seeing 2GHz Athlons tying or beating 3.2GHz Pentiums.)
Marketing-wise, it'll be a win for Intel. Performance-wise (due to pipeline stalls), these changes will demand that Intel keep bumping up chip performance or else lose out to AMD. Of course, we all know which of these two criteria are the most important to the bottom-line.
Sometimes my jaw just drops when I see how much crap gets loaded on some of these business computers.
[And it also drops when I see how much spyware gets loaded on the typical home user's computer...]
For those into the technical side of this type of stuff and heck of a lot higher S/N ration, check out the Ace's Hardware forum. There's a large thread going on overthere taking about the rumors and what it would actually mean.
I should never say "no" or "nothing" when it comes to chip architecture... somewhere in the world there is a family of niche devices that do things in weird ways for their own reasons.
-
The only reason Apple went with PowerPC is because MOT completely blew it on the 680x0 line by being extremely late, with horrible yeilds and crazy prices. 68040s and 68060s were good procs that would have been great had they come out a year earlier.
The same thing MOT went on to do with PowerPC.
--- I do not moderate.
I thought that SSE and MMX both had significantly lower precision than standard IEEE floating point ops. If I'm wrong, please correct me, but if it is lower precision, it makes it useless for Real Work(tm).
Is this true? [I'm pretty sure it's true of Altivec, but that's the first I'd heard of it for SSE/MMX].
I thought it was known that they added 11, for a total of 31 stages. Ouch. Its looking more and more that one could stay with a highly overclocked Northwood while waiting for a Athlon 64 with all the new toys like PCI-X and some DDR-2.
1) Yes, I knew that.
2) I only read slashdot to karma whore.
3) I've heard Cowboy Neal has a longer pipeline
Nerd: Derogatory term typically directed at anybody with a lower Slashdot ID than you.
"As most of us know, a longer pipeline can lead to slowdowns in the form of branch mispredictions and pipeline stalls
Get off your high horse. Intel architects aren't dummies. Itanium benchmarks are starting to whoop some serious ass and the P4 and Athlon have been neck-and-neck for years. I'm sure Prescott will perform very well.
I can get into all kinds of architecture speak as to why your simplistic notions of mispredictions and pipeline stalls might not be so terrible. Who knows? Maybe Intel will execute both paths of a branch? They've already got partial instruction replay to make squashes much less expensive. With deep speculation, a big instruction window, good bypassing capabilities, and effective non-blocking caches, "pipeline stalls" are not an issue due to branch mispredictions. The bigger issue is memory latency/bandwidth and Intel has always done well with that. A branch misprediction can be easily tolerated...an L2 cache miss can't.
Look! IBM's perpetuating the MHz myth!
Guys, there's more to CPU architecture than what Apple's advertising department claims (or at least used to claim). I don't think anyone would doubt the PPC970/G5's superiority to the G4 performance-price wise (or has Apple somehow made a terrible mistake? ha), and yet it has a far longer pipeline than the G4. Perhaps there is more to pipeline size than trying to achieve a higher clock in exchange for less computation per cycle?
Or perhaps the only "megahertz myth" are Apple's vast simplifications of modern CPU technology?
For those of l33t h4x0r5 out there who are IEEE members, check out "Increasing Processor Performance by Implementing Deeper Pipelines" by Eric Sprangle and Doug Carmean of Intel's Pentium Processor Architecture Group in 2002. (Sorry about not having a link. I'm not at a location with IEEE Xplore access.)
In short, the paper describes how creating a deeper pipeline and increasing L2 cache can improve performance by 35-90% over a 2-GHz P4. This improvement is not dependent on process, so one may anticipate a similar improvement based upon the new process, although hard data is not available to me at this time.
The paper acknowledges branch misprediction as the leading cause of performance degradation and includes the penalty in the above mentioned statistics.
If there's anything I've learned about computer architecture, it's that there are always more factors than you know what to do with. Got a problem with branch penalty? Make a more accurate predictor. In the meantime, you increase throughput with a longer pipeline. Why? Because everything else gets the boost. The golden rule of architecture: MAKE COMMON CASES FASTER!!!
There are two types of people: those prepared for the zombie apocalypse and those who will be eaten.
"As most of us know, a longer pipeline can lead to slowdowns in the form of branch mispredictions and pipeline stalls."
Sigh... most of the people I know cannot place the planets of the Solar System in their correct order. What a rarefied realm we inhabit here...
As the P4 EE proves, more cache is really important to the performance of the P4. Take a look at the specs of all but one of the new Prescotts' and you'll see that they come with 1 MB of cache instead of the Northwood standard of 512K.
That should at least allow Prescott to be on par with if not exceed the performance of Northwood. That said, I wouldn't expect it to be faster in everything. Those extra stages will hurt for certain functions no matter WHAT the cache.
"...Well, there's egg and bacon; egg sausage and bacon; egg and spam; egg bacon and spam; egg bacon sausage and spam..."
While in the past, longer pipelines did cause slowdowns, Intel maybe able to lengthen the pipelines by leveraging Hyper Threading (HT). http://intel.com/business/bss/products/hyperthread ing/overview.htm
Its my understanding that HT uses the gaps in the pipeline as a second virtual cpu.
-Cho
They claim it's supposed to compare against the the 'Thunderbird' model Athlon (the one that topped out at 1.4 GHz). Not the Duron.
Most people will keep matching it against the P4 regardless, of course. Will this continue to hold true against the Prescott (allowing AMD to hike up their PR numbers by a goodly amount), or will they stick to their supposed guns?
More info here.
Why would anyone engrave "Elbereth"?
intel is not just lengthening the pipeline for marketing reasons, they're doing it open up another front in the battle for more performance. by lengthening the pipeline, they reap the benefits of higher clockspeeds at the cost of branch mispredictions, etc. however, all this means is that things like trace cache and improved branch prediction are more effective at improving performance. in other words, if both amd and intel were to devote the same resources to improving branch prediction and minimizing the penalties for branch misprediction, amd would yield less of a performance gain. occasionally, this will lead to oddities such as the p3 outperforming williamette on a clock-for-clock basis, but overall the philosophy is sound.
A. Hartstein and Thomas R. Puzak (IBM): The Optimum Pipeline Depth for a Microprocessor [colorado.edu], ISCA 2002.
Let me guess...42?
Prescott will have 16KB of L1 cache (Northwood has 8KB) and 1024KB of L2 cache (Northwood has 512KB). These changes will most likely increase the performance per clock cycle.
Maybe the larger cache sizes will "make up" for the longer pipeline. I won't criticize Intel until I see benchmarks of 3.4GHz Northwood vs 3.4GHz Prescott.
So how long have you been an AMD fanboy/stock holder/employee? This is complete crap. The P4 performs very well in matlab or mathematica.
You're simply full of shit and yes, it's obvious.
Let me start out by saying i am a die hard AMD fan,However - in the two most important processor measurements,; MFLOPS(Million FLoating OPerations per Second),and MIPS(Million Instructions Per Second) ,the P4 accels.
:-)
In fact, even after over clocking my Barton 2500+ from 1.87 ghz to 2.45 ghz(3200+ is 2.2ghz mind you),i STILL can't touch a stock pentium 3.2 ghz with 800mhz fsb...or a 2.4 clocked to 3.2
People say that a longer pipeline is going to cause a lot of problems,and be slow, but this is not true because of 3 simple reasons; More Advanced Hypertheading,Faster MhZ,Larger Caches. Honestly either 1 of those 3 solutions would solve any of the problems,but intel will be introducing all 3 of them,With the most obvious improvement coming from clockspeed.
For single processor* crunching numbers,especially ones that have been optimized with SSL2 goodness, you just can't beat the pentium 4.
If you want to go dual proc or more for science..well then theres nothing like an AMD opteron.
I'm not going to upgrade my proc until Intel introduces the "socket T" And AMD introduces the socket 939 (I suggest you guys wait too)and both of them are running well and have proven to be highly overclockable.
"Comedy's a dead art form. Now tragedy, that's funny."
It's called the Pentium-M line. They perform very well per clock (based on PIII to some degree) and are low consumption.
Your EE or ME or ChemE full professor as a grad student could have written a FORTRAN program to compute some stuff and write output to a numeric text file or perhaps draw some plots using a subroutine library. You are probably thinking that anyone who can't sling together C programs using VI to draw graphics straight to X is a luser, but I am talking about pretty technically savy people who don't have time to spend on this stuff and who employ armies of Engineering majors from foreign lands who are not up on this stuff either.
My own take is that if a particular numerical calculation can be easily programmed by some package, it must not be on the cutting edge of research because someone has already done it. Besides, if your software package is really deep, most of the effort goes into the architecture and the data flows and into graphics, and the RAD bit is only simplifying a tiny part of what you are spending your time. A high-power scientific data visualization is really a video game, and how many video games are implemented in Matlab?
But what Perl is to text processing, Python is to collections, and VB is to slinging together a GUI, Matlab is to numerics (what used to be FORTRAN libraries) -- it may not have the best algorithms, but it has a lot of algorithms -- it has a semi-decent scripting language, and it has some facility with producing plots from your computations and other data.
Now that's the thing -- if you are doing matrix operations or using some canned function (most likely C under the hood), Matlab is as fast as fast can be. The minute you start looping in Matlab, it is interpreted and the speeds are in the Python range.
Before you knock it completely, it has very good integration with Java modules -- more seamless than with C modules. While Java may be pokey for its GUI, for tight numeric loops the JIT is almost as fast as C -- no joke, a person should consider writing numeric extensions to Matlab in Java of all things, especially on Windows where they tweaked up Java 1.4.2_03. And how many scripting languages (OK, Jython) have this level of Java integration?
But as a scripting language, Matlab has its shortcomings. It started out as a matrix calculator and has had features grafted on in a hodge-podge Visual Basic 6.0 kind of way. In terms of its data type restrictions and fubar scoping rules and brain-dead object extensions, I don't think, as they say, it scales very well.
My other peeve is that it is proprietary, and while Math Works is not Microsoft, I worry if engineering schools, emphasizing use of "commercial packages students will use in the real world when they graduate" (as opposed to professors dinking around with their homebrew software for use in instruction), are becoming trade schools shilling for the big software houses. I don't have a lot of experience with it, but in place of Matlab we should be using stuff like Python and the Python NumPy extension -- Open Source alternative, comparable performance, C extensions for speed, but much more Turing complete, consistent, and scalable.
And where is Matlab 6.5 using Java internally? Try doing a Files Open to start editing a Matlab script (M-file) with the Matlab editor window. One potato, two potato, three potato, and the window comes up. Now what language has that kind of GUI lag, I wonder what it could be?
As most of us know, a longer pipeline can lead to slowdowns in the form of branch mispredictions and pipeline stalls.
Damn skippy most of us know that. The receptionist, the janitor and my Honda mechanic all pointed that out today. =)
I do not understand Intel nor other companies that do not try to develop anything besides x86
There are lots of architectures in use besides x86 from many different companies...macs don't use x86, nor do most embedded devices or really anything besides intel (and their clones) PCs. On the other hand, there are millions of PCs out there in a variety of applications and not just as desktops.
instead of coming up with something new, Intel tries to patch its products with more crap.
Backwards compatability is why intel has stayed in business. This is why x86 is still around. The sheer number of programs written for x86 would cost hundreds of billions to change architectures. It isn't just a matter of recompiling - many programs, especially older ones, rely on certain features and issues that are found only in x86.
When Apple realized that MOS Technolgies' clone of 6800 was not the best solution, the architecture was replaced with a new one that better suited Apple's goals.
Apple has nowhere near the market share as intel does and their hardware has been tailored to a different crowd - mainly the home and graphic arts niche. x86 is everywhere - not just under your desk. There are billions of processes relying on it. To dump it outright would wreak havoc.
-
The Pentium M is a processor that runs at a slower clock speed than Pentium 4s with rougly equivalent performance. So Intel at least changed tactics in the mobile field.
I think the bottom line is performance... if they can keep reasonably close while continuing on the current path, they won't change. If some future chip ends up being horribly slow compared to others, they'll have to switch.
Now, there is another aspect that could make all this irrelevant... it is rumoured that the successor to Prescott might be multi-cored, i.e. multiple processors on one die. That would certainly give them a way to market a lower clock speed, if they have two or more CPUs in their chip.
The Quantium has the following new features:
Those awful common people don't share your interests, so they must be morons. I mean, obviously there is nothing more to human life than computer processors and astronomy.
I knew they were up to something when this mail appeared on the linux-kernel mailing list in 2000. 4.3 GHz, indeed!
The site seems semi-slashdotted. Half the time you'll get a "too many users" error.
By the way - women's breasts kick ass!
I decided to test Arctic Silver 5, Arctic Silver 3, OCZ Ultra II Premium Silver Compound, and CompUSA Silver Thermal Grease. This test was not conducted to test performance, but rather to determine if these compounds have Silver as an ingredient.
All Testing was done twice, once on a jeweler's acid free 'Black stone', and the test was repeated on paper. The testing solution was Nitric acid and Muriatic acid that was pre-mixed professionally.
The tests produced some very disturbing results:
OCZ Ultra II Premium Silver compound and the CompUSA Silver Thermal Grease has ZERO silver in it.
The testing solution stayed orange - if it had any silver in it, the acids would turn varying degrees of red, depending on the purity of the silver present. OCZ claims that OCZ Ultra II Premium Silver compound is, "Made with 99.9% pure micronized silver, Over 70% silver content by weight".
I cannot concur and my tests conclusively show that there is Zero micronized silver present, and Zero silver content by weight.
Arctic Silver 3 and Arctic Silver 5 were also tested and both produced a blood red color, indicating 90% - 100% purity of Silver in both Arctic Silver 3 and Arctic Silver 5. Arctic Silver's claim of, "Contains 99.9% pure silver" by my testing is accurate and of the compounds tested, only Arctic Silver products produced results showing that Silver is in fact present.
The tubes in the picture below from left to right, Arctic Silver 5, Arctic Silver 3, OCZ Ultra II Premium Silver Compound and CompUSA Silver Thermal Grease.
In picture 3 below, from left to right is Arctic Silver 5, Arctic Silver 3 and OCZ Ultra II Premium Silver Compound. The compounds were placed on the paper and the acid was place on the compound undisturbed. Notice how the acid drop placed on the OCZ Ultra II Premium Silver Compound remains orange, indicating zero silver present:
When you go into a jewelry store and buy a sterling silver or a fine silver necklace, you expect the jewelry to be made of sterling or fine silver. The same should apply to silver thermal pastes - if the silver paste has no silver in it and the manufacturer says it does, that is misleading.
Based on my testing, I can not recommend OCZ Ultra II Premium Silver Compound or CompUSA Silver Thermal Grease, as they are both misleading products with zero silver in them. If you want a product that actually has silver as an ingredient, Arctic Silver 3, Arctic Silver 5 or Arctic Silver Adhesive tested OK.
Ed Note: Silversinksam's conclusions have been verified by an independent testing laboratory - details will follow in Part 2 of this article.
I look forward to the Pentium X that will have an infinite pipeline, infinite clockspeed, and get nothing done at each stage!
You have to remember that a garden variety PC is a very unpredictable environment. You have network packets coming in, mouse events, keyboard presses, USB chatter, DMA access, every event generates and interrupt that requires the processor to stop what it's doing, and start the pipeline over again.
It's nothing personal, but articles like this one, as well as posts like this, drive me absolutely batty with the amount of incorrect ideas propagated. It's not that one particular person is misinformed -- it's just that the amount of generally bogus information is silly.
First off, at some point, as far as I can tell, a bunch of people read Maximum PC or somesuch consumer "PC enthusiast" magazines, and read about "The Megahertz Myth". Maybe Ars Technica ran the story that started all this. Heck if I know. All that the original author was trying to do was point out that people shouldn't judge processors strictly by clock speed.
Boy, did they ever create a monster. Somehow, a bunch of folks managed to get the idea that Intel was pulling this as some sort of PR job to deliberately trick people into buying their processors. For Chrissake, this is such an incredibly stupid idea. The OEMs have purchasers that know what they're buying. Not only are they not going to just sit down and look at benchmarks, they're going to have a bunch of test machines built when deciding what to go with. That and business considerations outweight any "MHz rating". The OEM market just plain doesn't care. The only people getting excited about the "MHz Myth" are the "PC enthusiasts", a tiny, tiny sliver of a group when it comes to dollar value. If the sort of "PC enthusiast
riffraff really think that they constitute any kind of a significant market to Intel -- enough for Intel to *redesign their entire processor*, using a longer pipeline and higher clock rate, around getting them to purchase a computer, they are vastly overestimating their own importance in the universe.
When Intel makes the decision about a new processor, it's a pretty safe bet that they don't run out and say "Gee, how would Joe Assmunch in Marketing like us to structure this thing?" They have many, many PhDs in chip and circuit design who have many competing ideas about what the best designs would be. They run many, many simulations before even thinking about deciding on major design decisions.
The "PC enthusiast" folks who think that Intel has taken this path to trick those people that buy from Dell, and that, ho ho ho, *they* are smart enough to see through the trick are ridiculous. If Intel wanted a high clock rate to put on stickers, they could jack the thing through the sky, run at 10GHz, then demux data and only accept data at a lower rate into the various units. Some of the units would move to even more instructions per cycle.
The *current* poster is talking about *keyboard* and *mouse* events? "USB chatter"? Those don't even show up on the *radar*. You roll that mouse, send your 200 Hz interrupts, and you worry about 200 measly mispredictions per second? Just blowing away the page table cache during process switches (which runs at 100 Hz on Linux 2.4 x86 by default) already dwarfs any misprediction performance hit from the said devices, and folks frequently bump it up by an order of magnitude or so and don't see any measurable performance hit -- on Pentium IIs.
As for DMA, the entire point of DMA is so that the processor *isn't* running code from the host. It can continue on in its own happy little world while a co-processor pokes at the memory bus.
You might see significant branch misprediction issues with an inner loop with a branch statement that flicks back and forth just about every loop or so to screw over the branch caching. And "significant" is still pretty minor. The compilers hint to the CPU whether a branch is likely to be taken...it's not as if there's this massive, awful mistake that all the chip designers in the world are making that Joe I-Built-My-Own-Computer-
May we never see th
If you have a bunch of steps to do on each of a million pixels, your best bet is to do them all on the current pixel, then advance to the next one -- which keeps everything in cache. But the vectorized languages tend to do the first step to all million pixels, then do the second step to all million pixels, etc. That swaps everything out to RAM every time, so you're running at the main bus fetch/write rate, not at the CPU's clock speed.
I'm kind of tired of you armchair OS coders. So the happy few, highly paid Microsoft employees, 20 years experience in copying IBM, thousands of stock options in Redmond decide the next gen OS will have some wack FS and they have to be called morons? How do you know better? Hasn't Microsoft produced the best selling OS on the market for 15 years? Why don't YOU have the job leading the Longhorn team?
Oh. Yeah... LINUX.
There's a rather crucial difference here. You are very much not Linus. As a matter of fact, I will happily bet that you have never submitted a patch to Linux. As have most of the people on Slashdot vicariously living through Linus's triumpths.
Linus might, in fact, be a not unreasonable critic, to some degree, of the pipeline length being discussed in the article. He, unlike most Slashdotters, is actually familiar with (a) high level system architecture, (b) the x86 instruction set that's being used here, and (c) probably, by virtue of enough low level work, (plus, he may have potentially done work in the field) at least something about the code gcc and other compilers are spitting out.
Here's some metric of how far most Slashdotters are from being qualified to comment on this: I'm probably reasonably knowledgeable on the issues involved relative to the bulk of Slashdotters, using only the other posts as a watermark for what other Slashdotters know.
One assignment I had, back at Carnegie Mellon as an undergrad in a CS class, was to design a very simple processor. The assignment would have made a CE laugh out loud -- we didn't have to worry about gate delay or initial states or inductance or anything that's an issue in the real world. We got to ignore all timing problems. We could split signals as much as we wanted. It was strictly a logical processor. That processor is probably more than most Slashdotters have built.
I have a friend in grad school who is a CE. He laughed his ass off when he heard about the assignment. He, however, is a grad student. He hasn't published for years, he doesn't have much experience, and he has a *lot* to learn. He designs chips on a pretty regular basis.
If *he* came along and said "those people at Intel are idiots", *he'd* be laughed down by a PhD in the field. Why? Because he simply hasn't done research in branch prediction or any of the related issues, and isn't remotely qualified. Of course, *he* wouldn't be out trying to call out the Intel engineers because he's aware of how competent they are.
As a matter of fact, anyone who hasn't either gotten into serious compiler design research (not just "I wrote a compiler once for a class") or whatever areas are relevant to the CE chip work probably isn't qualified to criticize the Intel engineers on design decisions. You aren't seeing a lot of those in this article. Why? Because they have a fair amount of respect for the Intel folks and know enough to avoid making damn fools out of themselves.
Folks who are end users -- software engineers, system builders, etc -- are really qualified to judge the processor on price and performance as a black box, and not much else. I include myself in there, and I've read a number of research papers on the thing in the past. And, frankly, the Intel folks have done a pretty good job if you measure the product as a black box from a performance standpoint. If you want to complain, complain about the price. Don't try to say "Well, that P4 sure is good, but what the Intel engineers really messed up on is that HyperThreading stuff. Boy, if they only understood PCI, then they never would have done anything like it." I see way too many completely and utterly uninformed posts, and it propagates.
May we never see th
Ok, someone please mod the parent -1 clueless!
First off, the P4 has about the best branch predictor in the business. The only processor that had a better branch predictor was AMD's K6 (and the NexGen chip from which the design originated).
As for the "slowness" of the P4 vs. the PIII, in many applications the P4 is actually FASTER, clock for clock, than the P3. This was even true back in the days of the "Willamette" P4, and especially true for the "Northwood" P4. Plus the P4 core clocked nearly twice as faster on an identical fab process (2.0GHz vs. 1.13GHz, the top speeds produced for both cores at a 180nm node), so who cares about clock for clock! Dollar for dollar the P4 was WAY faster.
And the broad is better than deep because the CPU is doing many things at once? First off, the CPU can only handle ONE task at any given time unless you support SMT. Intel supports SMT ("hyperthreading in Intel-speak), AMD does not at this time. But more to the point, both broad and deep give you certain advantages and disadvantages with regards to keeping your execution units full, but both requirements end up being pretty similar in the end, ie you need instructions that do not depend on the results of previous instructions.
Now in your third paragraph things start getting really sketchy. First off, AMD's highest market share numbers ever were back in the late 386/early 486 days when they hit 30%. With the K6-2 AMD managed up to about 20% market share. Now they're sitting at about 15-17%, depending on who you ask, and they've been there for a while. The Q4 reports from Intel and AMD tend to suggest that Intel actually gained market share from AMD over the past 3 months, that AMD's average selling price went up a lot more (so theit total revenue is increased more than Intel's).
Of course, the part about AMD's 24-bit FPU must be when the crack really kicking in for the original poster. The K6, like EVERY OTHER FRIGGING X87 FPU EVER PRODUCED, had an 80-bit floating point unit! The K6-III did not make any changes to the FPU (though the K6-2 made a very minor change to how it handled the FXCH instruction, which mainly helped performance in Quake). These chips also did not have any serious compatibility problems, though there were the standard few errata that you get with any modern x86 processor (whether made by Intel, AMD, VIA or anyone else). I can only remember one errata in the K6 line of chips that ever really caused problems, and that was fixed pretty early on with a new stepping to the original K6. There were also timing loop bugs that caused some problems, but that was the result of dumb-ass software, the hardware performed exactly as expected.
(Sorry about not having a link. I'm not at a location with IEEE Xplore access.)
No problem. Neither are we...
So once again Intel goes for the petrol guzzler rather than efficient design ? It makes me sad to see that current P4 3.2 GHz can consume 100W vs. 12W for a G4 PowerPC 1 GHz ? I'd rather have a couple of G4 at 12W each and save wasting energy. In fact I've seen a few 1GHz chips around 7-10W, why not use 2 or 4 of them instead of one behemoth ? Efficiency and simplicity in design are surely better than wasted energy for the sake of it.
"Further contributing to the MHz Myth..."
Nothing like a load of bias to fill out an introductory clause.
Wasn't Prescott also the CPU for which branch prediction was going to be moved into the compiler (i.e. the compiler has to generate code that tells the CPU which way the branch will probably go)?
I reckon a compiler could do a much better job predicting branches (it can do much more analysis then would be worth doing at run time).
If all this is true, longer pipelines may have a less severe effect than they have had up to now.
IANAE(xpert), though.
Please correct me if I got my facts wrong.
Firstly I wan't to debunk some myths on branch missprediction stalls. Firstly from pentium pro all the stalls in the pipeline combined was more than double the CPI of the computer. And thats result of OO execution, when you have large OO window the any single type of stall means less to you. Your execution engine probably has some instructions BEFORE the branch that where stalled before the branch was executed, so the stall is not mathematically equal. 2nd point is that larger OoO window means a LOT more work to deal with, so P4 requires more work to do with large OoO window, and that large OoO window is BIG part of long pipeline, but the benefits of large OoO are more related to memory latency, it gives memory subsystem more known addresses to handle in the memory pipeline which means better scalability, when increasing core execution capabilities, [either width or dept=clock it doesn't matter].
Now P4 shines in better dealing with memory latencies, and thats more important than just a branch missprediction. Also Prescot has superior branch predictor so reducing misspredicted branches negates the missprediction penalty. L1&L2 cache are doubled which means that integer performance of prescott is increased by reducing the critical code paths from going to L2 cache. And increased L2 cache reduces the amount of time spend on waiting the memory, so I'd say prescott should do just fine in IPC side of things also.
In overall P4's systems aren't as good as integrated memory controller, but it just helps to tolerate the latency a lot better than the PIII or athlon design.And in overall when most of time is spend on waiting memory references the narrow and deep approach seems slightly better. [Narrow reduces the costs of increased reordering capabilities, don't put G5 reference here they patented clever trick and are risc for christ sake. So they are wide and deep for improved reordering.]
.
The thing I'd wan't from AMD would be more or less, one additional pipeline stages for better reordering capabilities.
Emacs is good operating system, but it has one flaw: Its text editor could be better.
I'm gonna convert to Power PC machines. RISC. Mmmmmm RISC is good...
However, it should be noted that during the time the Athlon XPs were introduced, so was DDR memory, which helped with the bandwidth issues.
Explain my Athlon 1.33GHz with DDR.
The second generation Athlon (Thunderbird) had DDR support, at 200 or 266MHz.
AthlonXP is the third generation Athlon, and IIRC the names are based on Thunderbird speeds.
I may be wrong about the generations, but I know my PC had DDR266.
It's never too late to have a happy childhood.
Not only is Price vs Performance very good with AMD AFAIK AMD is not in the habbit of putting out new processors that are crippled vs their predecessors. And while Intel is focused on keeping 32bit processors on their roadmaps AMD is being agressive with moves to 64Bit computing even long term plans show them making less 32 bit processors in the next few years.
And what's up with Intel adopting the format that Alpha used with their processors?? It was always a royal PITA and I've never had any issues with the current system or are there that many id10t's out there that are just not careful with those pins?
Started to pull away in quality??? I would say that Apple has always been ahead in quality (sure there have been quality issues - but most Apple buyers are assholes and attorneys)
Apple has always been better (easier, faster) at individual tasks such as audio editing, video editing, and overall ease of use. They also FAR surpass ANY PC in terms in longevity ... I know of a finger's count few PCs that still work from 1984, that in use, or for that matter, still useable. I see Mac SE's from 1985 almost every day - still working after 19 years!
The megahertz myth is honestly less about performance as whole - it is performance perception.
Case in point - two computers are taken into a retirement home. (An iMac and similarly priced/styled PC) They are told where the internet connection is and told to get onto the internet - the average Mac is up in 17 minutes out of the box - the average PC is never able to get online or goes well beyond an hour. Conclusion - ease of productivity - is perceived speed.
Yell & scream & rant & rave... it's no use... you need a shaaaave ~ Bugs Bunny
But here's a tip for you: avoid loops if possible.
Amen. I've actually had this emphasized to me in my University statistics class, digital comm classes, and at my internships. Vectorize everything and it's smoking fast. Loop and you're done for.
Why else would they name it after the lardarse, ex-boxer, mulleted-Welsh-pikey-thumping British deputy PM?
When I am king, you will be first against the wall.
"...will have a longer pipeline than Northwood. "
I know high school was a long time ago, but seriously.
If Intel had cancelled the Pentium Pro based on people whining about how it was different from the Pentium, and didn't stack up, where would the P3 be now?
Vintage computer games and RPG books available. Email me if you're interested.
Thanks for the digression. This guy's story is obviously anecdotal evidence, and therefore a fallacy.
Yawn. Interpertation of blerb: blah blah blah *game geeks* blah blah blah blah *money money* blah blah. Blah blah blah bah *we can't think of anything new* blah blah blah blah blah blah blah blah blah blah.
This is all 4004 shit. Who the hell cares?
PPC. Say it: PPC.
The Opterons do slightly better, as the memory controller is on-chip.
The Raven
For once, the correct information is listed. (Rather than just making stuff up, and stating it as fact)
Um, isn't that handled by branch prediction? The processor knows it has just branched the same way for the last 5,000 iterations then it will correctly predict the branch target each time. The only time it misses in your example would be at the end of the last iteration.
Matlab now uses the Java JIT to speed up loops.
I've had some loop-heavy code speed up by a factor of 10. There is also some speedup when m-files are compiled to C and native code in some cases as well. The interpreter is not real fast but it's fast enough, esp. if you do things the 'Matlab way' and heavily vectorize your code. The Mathworks plans to make Matlab ultimately as fast as C or Fortran.
Matlab works (and in fact I've done my entire physics PhD thesis with it, including instrument control) because it's easy enough to learn, has good graphics/plotting/curvefitting capability, the matrix model maps well onto typical scientific computing needs, and because more sophisticated languages are not necessary for 95% of data analysis and numerical computation. The toolboxes are quite useful as well and it's 'fast enough'. The friendly GUI and help system are a big bonus too, and I work very, very productively under Matlab.
NumPy, GSL, and Python are great but require a bigger learning curve and in most physics depts aren't widely used.
thanks
If the Athlon 15 stage beat the p4-20, is it possible for Intel two do two competing pipes that could do both branchs of a True/False if?
In single decisions, wouldn't that be likely to improve performance dramatically?
Dunno if it is possible, but just a thought....
-l