AMD's Showcases Quad-Core Barcelona CPU
Gr8Apes writes "AMD has showcased their new 65nm Barcelona quad-core CPU. It is labeled a quad-core Opteron, but according to Infoworld's Tom Yeager, is really a redefinition of x86. Each core has a new vector math processing unit (SSE128), separate integer and floating point schedulers, and new nested paging tables (to vastly improve hardware virtualization). According to AMD, the new vector math units alone should improve floating point operation by 80%. Some analysts are skeptical, waiting for benchmarks. Will AMD dethrone Intel again? Only time will tell."
Things have come a long way since the heady days of bit slice processors. The first microcode I wrote was for an XOR operation - I could not think of anything simpler, that would actually do something useful...
Excuse me, but please get off my Pennisetum Clandestinum, eh!
Anyone know what "SSE128" means? SSE registers have been 128 bit from day one.
I don't care if it's 65nm, 45nm or 10mm - that's a completely irrelevant (to me as a user and purchaser) implementation detail. I care about the results - how fast is it for my workloads? How much is it? How much power does it use?
Obsession about process size is sillier than obsession over clock speeds.
If AMD can produce a better performing chip at 65nm, then who the hell cares if Intel - or anyone else - move to a 45nm process?
Advanced users are users too!
"Will AMD dethrone Intel again?" Dear AMD, meet Larrabee. http://www.theinquirer.net/default.aspx?article=37 548
AMD might kick Intel in the nuts a little but definitely not dethrone.
As long as AMD and Intel continue to chase each other in the x86 market, high end chips become low end in the span of six months. Just keep buying 6 months behind the press releases and you get great processors for next to nothing.
Weaselmancer
rediculous.
"Lets make a Octa-core processor!"
I read the Tom Yager piece and I started drooling so much I shorted out my computer.
I'll just hit the Submit button now and see if I can get this posted to Slashdot before the drool shorts out this computer too.
Am i missing something or am i completly wrong?
... and you can wake me up when the processors themselves are smaller than a square millimeter. I need them for my new thumb-top computer design.
Keeping in scientific fact, how much heat has to be generated for 1 MIPS?
The fact is, absolutely none. It has been shown that only the destruction of information via AND and like instructions create entropy (heat). As long as you use only 3 types of gates (pass through, not, xor), you can create a heat-free CPU. Provided we do want to check for bit errors, we could maintain a very low heat via ECC like checking. Estimates on that are 10^8 lower than present.
We could keep 98% of our efficiency of current day chips if we switched to this method.
....is in the eating. Until we see benchmarks everything is just ignorant speculation.
In my own benchmarks (generic C integer and floating point scientific code) I have found that the Core Duo and Core 2 Duo aren't all that quick compared with an AMD64. Clock for clock the AMD64 Opterons we have are about 50% quicker than an equivalent Core 2 Duo for integer work. I know this doesn't agree with all the usual magazine benchmarks but they are heavily biased towards using SSE instructions where possible and it is SSE where the Core 2 Duo has been a real improvement over previous Intel designs and also bests the AMD chips. Hopefully, AMD has recognised this and the new SSE implementation will bring them back on par with Intel for these benchmarks but even today an AMD64 processor is a beast and more than a match for anything Intel produces.
"I have the attention span of a strobe lit goldfish, please get to the point quickly!"
I would care. POWER CONSUMPTION/EFFICIENCY. If I want a space heater, I'll stick with a 3.4 GHz P4 with HyperThreading. I DON'T WANT ONE. As it is, for what I like doing and for what I want to do, current-gen processors work just fine. I can play my games, make my music, draw shit, upload data, and check out sites like this, while maintaining my bank account, talk with other people, and more, at the same time. I got over the clock speed thing the second I actually owned a G3. Granted, Windows emulation sucked balls, but what was made for that computer, it once again, did what I wanted it to do. Sometimes better, sometimes worse. YMMV. If anyone wants bragging rights of ANY sort, the technology to shrink the die-size is the way to go. Go read a book called "Nanotime," IIRC. THAT is what I'd like to see, minus superdense C4 that's equivalent to a mini tacnuke, and those insane driving laws and shit.
Still waiting on Serviscope_minor to wake up to fucking reality and realize that Jessica Price isn't going to fuck him.
come on, amd guys!
get the hardware encryption support from VIA licensed and include it; that will be needed in the next years too!
But will it run Linux?
Tubby or not tubby. Fat is the question
No but a good hard, well aimed, holding nothing back kick in the nuts can leave them impotent,
so they'll have to do some ugly procedures to survive it in the long run. A couple of identical
blows in the meantime could leave them sterile, so if the current setups begin to die out.
And Intel had no more babies waiting anymore, they will not be dethrowned, but will be getting
an hounerable mention in the history books.
If you don't like my sig then don't read it.
Clock speed doesn't mean crap anyways. It's all in the code. I see guitar tuning programs for the computer... TEN megs in size, slow as hell, and inaccurate! I believe APTuner is FAR smaller than most, faster and far more accurate. People just don't know how to code, plus the fastest ways to code are copyrighted, which they shouldn't be since they'd be utterly obvious to any programer with that standard "ordinary" knowledge in that language, so one has to make workarounds that inevitably end up being slower. No more oldskool hacker ethic, now it's greed.
Still waiting on Serviscope_minor to wake up to fucking reality and realize that Jessica Price isn't going to fuck him.
Insightful? Just because you can't correlate die size to benefits doesn't mean the rest of us can't. Get off "news for nerds" and go to "news for regular Joe" intead.
Less power dissipation, higher clock speed....
45nm is not inherently "better" than 65nm any more than 3Ghz is inherently "better" than 1Ghz. A smaller process size is a means to an end, it's not an end in itself.
The end is the delicate balance of improving power / watt while increasing overall performance and keeping the price down. If AMD can deliver a chip that does a better job of that at 65nm than an Intel 45nm one, then the AMD chip is not somehow "worse" than the Intel one just because it doesn't use 45nm. That's just stupid.
I'm not saying AMD can do that, but I think that criticizing them for not being ready for 45nm yet is more than premature.
AMD's actually guilty of the same flawed logic though - their criticism of Intel's 4 core processor being just 2 dual cores stuck together is just as pointless. It doesn't matter what matters is how well the processor meets the requirements of its target market.
Advanced users are users too!
read the article, that is an x86 GPU it wouldn't be able to compete with general purpose CPUs
Three quad cores for the pasty-nerds under the sky,
Seven for the WoW-nerds in their halls of stone,
Nine for Diablo Men doomed to die,
One for the Dark Nerd on his dark throne
In the Land of Silicon where the corporations lie.
One quad core to rule them all, One quad core to find them,
One quad core to bring them all and in the darkness bind them
In the Land of Silicon where the corporations lie.
He paused, and then said in a deep voice,
This is the Master-quad core, the One quad core to rule them all.
Who says news can't come from an advertising section? It's stil a source of information.....
Still waiting on Serviscope_minor to wake up to fucking reality and realize that Jessica Price isn't going to fuck him.
I dont think I've ever read such an admiring review of a CPU design. Last time I remember a chip sounds so fantastic was the Alpha or something like ten years ago. If a lot of all the new things really work the way they sound in theory, well then yeah I guess it's evident this Barcelona thing is really going to be something else.
The design for VW performance sounds extra interesting
I don't care if it's 65nm, 45nm or 10mm - that's a completely irrelevant (to me as a user and purchaser) implementation detail.
This is Slashdot. We care about those details. You can read more about the "super fast, super cool, super cheap!" market speak on the company's official press releases section.
AMD showcases quad core Barcelona CPU? This is old news from last year. The PCWorld article linked to by this story is dated November 30, 2006.
I want a floating quad.
sometimes, nothing.
I'm not kidding. In SSE I'm familiar with, one of the input registers is always an output register, which means its contents are destroyed. Another flaw is that there aren't enough registers... SSE uses 8, where 32 are commonly not enough when latency is longish (especially with SoA-style progamming, where pragmatically a single vec3 occupies 3 128-bit registers).
... or Madd. You know, multiply-add. Does it have that?
Why? it seems to me that there's far more interesting about the design of this chip than the process size.
I will not surprised if AMD dethrones Intel again. It is a classical Intel vs. AMD battle...
Intel comes up with some hair-brained scheme that "More is better!". (like Viagra) They design something new and decide to make it faster (or in this case just glue more of them together). Back in the day it was the "GHz" now it's all about how many "Cores" you got. This tactic seems to suit Intel quite well and dethrones AMD for about a year and a half... During this time AMD massively redesigns there chips to integrate new, emerging technologies. The gamers and server operators of the world sit by their AMD chips knowing that they might not have the fastest chips for the time being but they are more technologically advanced.
Intel keeps cranking out their "Viagra" chips that become hotter and bigger energy hogs. When they finally come to the realization that their product SUCKS (as with the P4's) AMD is already a step ahead and swiftly takes the market back with their chips, that have always be a step ahead technologically. Intel scrambles, does a redesign, fails, tries again and history repeats. Intel's "Viagra" mentality causes their chips to be more expensive while AMD's "slow and steady wins the race" mentality allows them to keep their prices down.
--
Get your facts first then distort them as you please -Mark Twain
So now I'll see four penguins at startup!
"Nine times out of ten, starting a fire is not the best way to solve the problem." - my wife
What really interest me is how does it compare with single and double precision calculations. If AMD gets in the range of Itanium performaces will Intel follow and kill their own Itanium by boosting core 2 FP ?
Feature size has denominated progress (as measure either by raw performance or performance per watt) over an unbroken 30 year period. Do you recall the very passionate debates about RISC vs CISC? Did a RISC design at one feature size ever beat a CISC design at the next shrink? I think not. Design has never mattered anywhere near as much as feature size. Not that you can't get design wrong. But then you can get a shrink wrong, too, and end up with 1% yields. AMD managed briefly to remain competitive with Intel playing a full shrink behind when Intel did that rather stupid marketron-driven face-plant into the thermal wall (against good advice from their Israel team, who later came to the rescue with Core Duo).
With the recent skyrocket of leakage current, the holy grail of feature size is somewhat tarnished, but it still dominates the performance curve. You completely missed the relationship between feature shrinks and the performance crown. If Intel has better process technology than AMD (almost always) and AMD has a better design (most of the time since the Athlon was first launched) and both companies shrink every 18 months following the Moore projection (that unbroken 30 year historical trend) and AMD always shrinks 9 months behind Intel, then the performance crown will pass back and forth exactly as often as either company announces their next product.
So I agree with you: feature size has no importance to the customer who wants performance for their dollar. Except that you can set your clock by it and project ten years into the future effective performance levels of shrinks we haven't even seen yet. Except for that part, yeah, I'm with you.
Well, until you show us your source code those numbers are as believable as anything else one might randomly type here...
The AACS key is NOT 0xF606EEFD628B1CA427BEA93A9CA9773F
Look. He's talking about something interesting, and the conversation started on the theoretical limits of computation. Thus being in theory land is not unreasonable. Unless you actually have something interesting to say with some actual data to back you up, quit bashing someone who actually DOES have ideas and knows what the fuck they're talking about.
A few years ago people would've said that it was impossible to pull data from below the noise floor in radio channels, but we do it routinly now. People said that OOP was just theoretical wankery, but now java is (to my dismay) one of the most popular languages aroudn. Same for relativity, but we have satalites. Where do you think processors started? You think someone went out and said "I think I'll make a few micron thick wafer of silicon and get a processor". All of our advancements in science have started out as theoretical concepts, and slowly they've become practiced, then practical, and then commonplace.
To be fair, I suspect that the resulting information from the computation itself carries... well... information. As all good computer scientists know information has a direct corolation to entropy, and in fact this is how we define randomness. How much entropy is a simple function of the information actually obtained (I.E. how acuratly we can predict each bit). Given how closely all of these things are tied I would be surprised if this entropy didn't somehow "count" towards the physical entropy of the system. Still, if it took a long complex computation to get those bits, it's not that much entropy lost to information (like say it's a yes, no answer to a very difficult exponentially difficult problem on a large N). It SEEMS like if you reverse the process it should take energy to do so. But I don't know, it also seems like light shouldn't interfere with light, and dark matter should run into other dark matter, and you shouldn't be able to do computation by repeatedly not looking at bits until you have what you want (I.E. Quantum computation, which we can already do, just only 5 bits at a time). Who knows... maybe this is feasable, unless you've got the physical background to prove him wrong, or make a useful contribution, quit pointlessly spewing shit at people who have useful things to say.
Cool - quad core cpu and my internet connection is still in the dark age.
Yep 800Kbps down. Look like my CPUs are going to sit twidle it thumbs for awhile
while waiting for the data
Is this yet another comparison to computers are like cars?
If AMD can produce a better performing chip at 65nm, then who the hell cares if Intel - or anyone else - move to a 45nm process?
They care. Just moving the chip from 65 nm to 45 nm means you can produce twice as much on the same silicon wafer. Also, if a 65 nm chip performs well, then a 45 nm version of it (with slight modifications of course) will work even better.
Opus: the Swiss army knife of audio codec
Likely Intel has an edge because they are [almost] ready for 45nm process, while AMD is just getting started on 65nm.
But it is interesting to see the two companies approach the problem from different ends. Do you improve the silicon process or do you alter the architecture and instruction set? I bet you the best answer will be to do both.
quad cores that actually share cache would be nice. these double duals kind of suck because architecturally they can never share cache. although AMD and Intel don't have very dual cores that can even share cache with-in themselves. (although I think Intel is releasing one soon?)
“Common sense is not so common.” — Voltaire
You haven't really added much from your original post. Die shrink is an implementation detail, probably something you read that sounded "futuristic"... The real goals are performance, and (usually secondarily) power consumption. Doesn't really matter how they achieve those goals.
I agree with you on one point - I think as with your requirements, the goal for the average non-technical home consumer should be focused more on efficiency than multi-core 64 bit 4MB cache, etc. But not everyone spends 95% of their time browsing the web and typing in their casserole recipes. Try to run a parallel make on a large project and you immediately appreciate a processor with a high clock rate, multiple cores, and huge cache. I don't need to read sci fi to know what I need to do my REAL job.
As an investor in the semi business I very much care about process sizes. One of the larger reasons Intel make money while AMD does not is because Intel can produce more chips at less cost.
AMD's actually guilty of the same flawed logic though - their criticism of Intel's 4 core processor being just 2 dual cores stuck together is just as pointless. It doesn't matter what matters is how well the processor meets the requirements of its target market.
I don't think it's about that. I mean, since Intel quickly pumped out something which seems like being 4 core cpu which took far shorter than to develop a new quad core design cpu makes them seem to lag behind, so what can you do to explain ? Not much really, besides what they said above, that Intel got quicker because it's not really a quad core design. Everyone will have their own multicore designed cpus sooner or later, so it won't really matter.
I am putting myself to the fullest possible use, which is all I can think that any conscious entity can ever hope to do.
It depends on what kind of data you are processing. If you are doing 32 bit calculations then you would want to compile your code for 32 bit, assuming your processor can handle it, as most 64 bit CPU can. If you are using 64 bit calculations then of course the 64 bit CPU would out perform the 32 bit as you would have to do additional coding steps to simulate 64 bit on 32 bit architecture, multiple 32 bit operations with bit shifting and the like.
If you took code that was written for 32 bit operations and compiled that code for 64 bit then the results would be very bad. The same issues where present when computers made the move from 16 to 32 bit.
I would assume that the code written in these tests are 32 bit. If so, then when evaluating CPU one would have to take into consideration if they needed 64 bit calculations. Also, before jumping the gun and making a rash decision some research would need to be taken to see if your operating system of choice supports true 64 bit operations. Many of the so called 64 bit operating systems only use the 64 bit CPU to support memory beyond 4GB.
Nick Powers
Encryption: I may not agree with what you say, but I will defend your right to encrypt it...
But how much does this really affect the retail price of a cpu? From randomly googling around, it looks like silicon wafer cost only translates to a dollar or two per cpu, so who cares if they can drop this expense by half? Surely other factors would be more important for cost, such as design expense, manufacturing time, and the cost of the fabrication equipment (which would increase if more manufacturing units are needed due to longer manufacturing time).
Power loss and speed seem to be complicated things, and I really doubt this continues to get better as the fabrication process keeps going to infinitesimally small sizes. It seems to be getting hard to keep the electrons confined at these small sizes, and that's going to have to start having a dominant negative impact at some point.
The best way of looking at things. I started off Intel and stuck with them up to about 1Ghz, jumped ship stayed with AMD untill my X2 3800, now I'm back to Intel with a Duo 2 6600. We'll see in 1-2 years who'll I'll be with next. The same goes for video cards and soda. Pepsi vs Coke, Nvidia vs ATI. Doesn't matter which ever gives me what I want at the lowest price.
Of course companies spend billions trying to convince you otherwise, but all products are just commodities and are practically the same in the end.
I guess you're still running that 4GHz P4 then?
The cesspool just got a check and balance.
Can we start rejecting 'scoops' that sound like a radio/TV demolition durby or monster-truck madness advertisement?
v4sw6PU$hw6ln6pr4F$ck 4/6$ma3+6u7LNS$w2m4l7U$i2e4+7en6a2X h
- Each of Barcelona's four cores incorporates a new vector math unit referred to as SSE128
SSE has always been 128bit (the 64bit simd extensions were called MMX). AMD used to funnel the instructions through a 64bit execution unit by splitting the work into two halves, the new core has a full 128bit SSE pipeline so doesn't need split the operations. Nothing new here, just a faster internal implementation. Can this deliver and 80% improvevment in benchmark performance? - quite possibly. Take a look at the Core2 FP perfromance numbers - it also has a full 128bit implementation of SSE.- And separating integer and floating-point schedulers also accelerates this thing called virtualization
Huh. Hardware virtualization affects how the processor handles certain instructions such as priviledged operations. FP instruction execution is unaffected. Virtualized workloads will benefit no more than non-virtualized workloads. Separate issue queues are good but does it specifically benefit virtualization? - no.- Barcelona blacks out power to individual portions of the chip that are idled, from in-core execution units to on-die bus controllers. This hasn't made it into PCs before
...
Intel call this 'intelligent power capability'.http://www.intel.com/technology/magazine/computin
- Barcelona adds Level 3 cache, a newcomer to the x86
Xeons have featured L3 caches for years. http://en.wikipedia.org/wiki/List_of_Intel_Xeon_m- Barcelona is genius, a genuinely new CPU that frees itself entirely of the millstone of the Pentium legacy.
- Barcelona is a new CPU, not a doubling of cores and not extensions strapped on here and there.
Barcelona is an Opteron, with a doubling of cores and some extensions strapped on here and there.I'm not meaning to detract from AMD here - the fact that they have still not had to make any radical changes to the opteron micro-architecture is a testament to the quality of the original design. They are slightly ahead of the game on virtualization - they're going to beat Intel to nested page tables - but other than that this chip is playing catchup. Overall this is going to be a very nice piece of kit to work with. But nothing radical and new here.
G.
This is way more than a mere quad core design. I was hoping to impart that. It is actually dropping some of the legacy x86 architecture internally, and adding big-iron features - the nested paging per core will be a huge plus for businesses that run multiple CPU machines with lots of virtualization for instance.
The separated schedulers for floating and integer math allows for more parallelism, another speed up.
The shared L3 and reduced latency L2 caches should put Barcelona ahead of Cloverton's split caches: only 2 cores share a cache, one of the features that made Core 2 Duo's faster was that the cache was shared among all existing processors.
I think the real heart of the matter is going to be that these CPUs will far outshine Intel's best in multi CPU rigs, especially business type rigs. They should be on par for single CPU gaming machines, although I'm going to hedge and say AMD might be a little faster on games that can use more than 1 core. (Almost none at the moment, I know, but they'll start coming out soon)
The cesspool just got a check and balance.
Amd HAS been making the design changes to CPUs for the past 8 years, since they created and patented the 64bit design. it may have taken them a few years to push ahead, but they did. since the 64bit creation, AMD has been creating 'new' ways of making the CPUs better.
there "logic" on how Intel is creating a quad core is correct, Intel is essentially just puting 2 and 2 together. AMD however is redefining how you put 2 and 2 together to greatly increase the output of 4 cores.
The STNG episode is similar to Orwell's 1984 where O'Brien, the torturer, shows four fingers before Winston's face, O'Brien increases the pain until Winston says that he sees five fingers, finally, Winston actually imagines five fingers.
There is no right to feel safe thru security vaudeville at the expense of everyone's freedom, privacy and tax money.
A quick check reveals that there is more to it than size: IPC, clock rate, wire delay .
CC.
TaijiQuan (Huang, 5 loosenings)
Rumours have it that their next CPU model will be named 'Real Madrid'...
Check out this benchmark from an article on extremetech:= 146710,00.asp7 ,00.asp
http://www.extremetech.com/image_popup/0,1694,iid
The actual article:
http://www.extremetech.com/article2/0,1697,201464
This isn't surprising at all. I believe its been well known that the FX chips crunches numbers faster than the Core 2 duo chips. Notice how the FX-62 almost ties or beats the X6800 in some tests. However, in the benchmarks that matter for most personal computer users, the Core 2 duo is the right choice, blending the right amounts of power in the right places. The FX looks like the marathon sprinter in some areas, but the Core 2 is the triathlete in others.
But AMD is right, it is no quad core, it is a multichip package of two dual cores. So calling it "quad core" is pure marketing speech because INTEL is lagging behind AMD again. They simply don't have a real quad core yet. So they use those multichip processors to hide the fact that they behind and even try to create the illusion that it is AMD which is behind.
Realistically you have to compare an INTEL quad core with an AMD dual processor dual core setup. But that is not fair either, because now the AMD rig has two independent memory systems and so has a big advantage on big data sets. But it also needs much more space. So basically the multichip module from INTEL only saves you space on the mainboard, technically it is a dual processor setup with a bad memory path.
And that could hurt AMDs real quad cores, because if everybody in the marketplace associates quad cores with INTELs multichip module they might severely underestimate the performance AMDs processor (hopefully) has. And that would hurt AMDs sales.
The problem I have with performance/watt is that it distorts the true "value" to the system owner. You NEED to break it down, because while power usage is important, the real issue comes down to "is the higher performance WORTH the extra power the chip draws". I personally don't CARE about performance/watt, except when the power draw is excessive, and I believe that is how MOST people will look at it.
Most laptop processors have a higher performance/watt than desktop processors because they are designed with battery life in mind. What people want is a processor that goes faster, but doesn't suck a huge amount of power to get that performance increase. The Pentium 4 got a LOT of flack toward the end with Prescott because the power demand was so far above the benefits that extra power provided. If it were only ten percent more than an Athlon 64 at the time, then no one would have been bothered by it, unless you are talking about a data center where the price for electric power is a very important consideration.
The only reason the whole fab process improvements is even brought up is because Intel is afraid of AMD. Intel has amazing resources when it comes to money and the ability to pay a lot more into their R&D, but in spite of this, AMD was seen as the performance leader before the Core 2 Duo came out, and AMD has the potential to come back and beat Intel again once K8L is released. It goes to show that if you spend some time looking at how to improve the overall system design and how things fit together, performance will go up a LOT.
Is the slashdot story queue this long? Or how much did AMD pay to get this old article reposted to try and keep the last two people who haven't jumped ship to Intel Core 2 Duo around?
It's not the cost of the wafer itself. It's the fact that hundreds of fabrication steps have to be applied individually to each wafer. So the cost of each fabrication step is roughly multiplied by the number of wafers you need to process. Fewer wafers allow for lower costs.
"Who the hell cares?" How about people that like to read news for nerds?
Just a thought.
Context-switching has long been the weakest design point for x86 in "PCs", especially servers. x86 arch is rooted in single-user, single-threaded, single-context apps. The in-core registers that CPU operations execute directly against have to be swapped out for each context switch. In *nix, that means every time a different process gets a timeslice, it's got to execute two slow copies between registers and at best cache RAM, at worst offchip RAM (over some offchip bus). If the register count is larger than the bus width (even onchip), that's another multiple on that slow cycle. That context-switch overhead can be larger than the timeslice allocated to each process's "turn" in the schedule for lower-latency / higher-response (lower "nice") processes, approaching realtime.
Unix was designed for multiusers, context-switching from the beginning. The chips it's run on coevolved with it. Linux arrived when x86 CPUs ran fast enough that context-switching was OK, but still a big waste compared with, say, MicroVAX multiple register sets. Windows architecture is rooted in the x86 architecture that DOS was designed for, though perhaps Vista has finally lost all of the old design baggage originated in the 8088/8086, but its long history of UI multitasking means it's context-switching all the time, which will gain in speed. The MacOS switch to BSD means it's got lots of power bound up in the context switches that could be released with Barcelona.
So while low-level benchmarks might show something like 80% FPU improvement, the high level (application) performance could improve quite a lot more. Recompiling apps to machine code that exploits more registers without the context-switching penalties could find multiples, especially apps with realtime multimedia that run concurrently with other apps. Intel's hyperthreading already gets past some of these bottlenecks in distributing tasks among multiple cores, but the Barcelona paging tables go even deeper, for likely extra performance (on top of Barcelona's own hyperthreading and new L3 cache).
Aside from the marketing "vapormarks" we'll surely see out of AMD (and their sockpuppets) before it's actually released "midyear", I'm looking forward to seeing how this thing really runs in multitasking apps. I'm expecting "like a greased snake across a griddle".
--
make install -not war
Which part of "2009 timeframe for Larabee" did you fail to understand? Both AMD and Intel have other things in store for 2 years away and it's not clear which is going to be better. AMD seems to be moving in the same direction of many microcores with its ATi GPUs (google for it on inquirer) and currently they have more experienced engineers on it, so it's a bit premature to discount them. Intel on the other hand has to execute a first-gen GPU on par with ATi's. And I'm curious what is nVidia thinking of all this.
While Intel may have jumped on quad-core early with discrete cores, at least they wedged them into the same package, running on motherboards good for something else besides a single run of parts.
Actually, your a total fucking idiot. Beyond the drab, pitiful hole of your existence there is another, where the externalities of manufacturing and power consumption appear! Gee! Think of that, another brain dead worthless sack of human flesh can't think beyond his own ignorance! Wee!!
Actually, from the important perspective of the difficulty of building a new machine around it, the Intel "dual-dual" core chips really are quad core -- they drop into the same socket as the previous dual core chip, placing four cores into the socket. That certainly helped speed the time to market for the chip.
If you mod me down, I shall become more powerful than you could possibly imagine.
Since when slashdot started using a labeled fanboy's quote to help get eyeballs? Tom Yager has been known to preach his AMD cult for a long long time. He frequently talks in AMD sponsored seminars. In his InfoWorld column, he can't last a week without praising AMD for the next breakthrough or putting clumsy spins on AMD's bad news. Like John Dvorak and Robert Scoble, this guy has zero credibility. Quoting him is like quoting all the telecom analysts when NASDAQ was 5000.
In another thought, if slashdot is playing the whole the good/AMD/Apple/Linux vs. evil/Intel/Microsoft/Windows game, Tom Yager can be one of the official trashtalker.
for multi-core CPUs! Launch a monitor and watch your CPUs. Benchmarking a multi-core CPU with a single-threaded program is really broken. It is really pathetic to see all those single-threaded program run on my Core 2 Duo. I'm using Linux and monitoring the CPUs using gkrellm. It's really sad to see all those single threaded program from the 90's being twice as slow as they should (and four times as slow on the quad-core etc.).
...but does it have an IOMMU?
I don't care if it's 65nm, 45nm or 10mm - that's a completely irrelevant (to me as a user and purchaser) implementation detail. I care about the results - how fast is it for my workloads? How much is it? How much power does it use?
Obsession about process size is sillier than obsession over clock speeds.
If AMD can produce a better performing chip at 65nm, then who the hell cares if Intel - or anyone else - move to a 45nm process?
If you move to a smaller transistor size you get more processors per wafer (Thus it gets much much cheaper) you lose less of your wafer to small defects (Again Cheaper). Moving data around smaller chips is faster( perhaps better for your workload possibly not).
It will also use less power to change transistor state and will use less voltage ( though increased leakage until recently minimized this effect).
Now there are dozens of other characteristics but by saying "we've moved to a lower die size" they are pretty much addressing all of your "simple" concerns.
We're not testing the compiler. IMHO, turning optimization OFF would be a fine idea, or at least unobjectionable.
The only important thing is that the compiler choices and options are fair. Using gcc on the Opteron and icc on the Core Duo would not be fair. Using gcc everywhere, with the same options, it completely fair.
One can also define "fair" as "all systems tweaked to the max", but this is rather difficult to do right. (see also: OS benchmarks, where the benchmarker knows all the ways to tweak the OS he uses most often)
Java is big-endian, like the SPARC and G4.
Java has strictly-defined floating-point math that is incompatible with the x86. An x86 chip must save floating-point options out to memory to force the exponent to be the right size.
JIT/emulation systems in general, including Java, do better with more registers. The G4 has about 6x as many once you exclude registers that are unavailable. (about 5 for x86, but at least 30 for the G4)
The proper fix is to run multiple copies of the benchmark.
I'm using Linux, with single-threaded apps, but so what? I run lots of things at once:
X, window manager, xterm, editor -- that is 4, plus the kernel
X, xterm, tar, gzip -- that is 4, plus the kernel
X, xterm, make, bash, cc1, cc1, cc1, gas, gas, ld... -- that's a lot of things!
The main use of vector units is running crappy Windows gamer "benchmarks" and MacOS Photoshop "benchmarks". The games don't even use the vector units all that much. It's just the benchmarks that use the vector units.
In the real world, vector units aren't good for much at all. You can do radar processing with them, but that isn't exactly a desktop app. Linux can use them for software RAID.
> ...will work even better...
If it works at all... maybe. Meandering capacitances between traces, variable leakage currents and higher susceptibility to electromigration mean that a substantial rework of the masks is required in order to move from 65 to 45nm.
Yes, you can make it faster, and yes you can make it cheaper, but you can't make it as reliable, nor can you make it use a higher proportion of it's power for computationally useful activities -- not without a substantial change in architecture, using error correction, reversible logic, self-timed circuits, &c.
So far Intel's superior process tech and AMDs superior architecture have passed the performance crown back and forth for several years, but it remains to be seen how long this condition will remain cyclically stable. I rather expect architecture to trump process in the next iteration, as AMD's partnership with IBM is likely to bring process nearer to parity, while Intel's architecture group is simply not competitive -- and revamping their staffing enough to make the kind of revolutionary change needed to recapture the lead would be too risky.
-I like my women like I like my tea: green-
The Larrabree is planned for 2009.
By then, both ATI/AMD and nVidia have plans to introduce their own "multiple chips on single card" family.
The current (for nVidia) / next (for ATI/AMD) line of DirectX 10 cards will be the last of single-super GPU, where different models in the line differs only by the clock speed and the disabled pipe-lines.
From G90 and R700 onward witch will probably appear around 2008-2009 too, the graphic card will look the same way as those multi-chip card pioneered from 3DFX (Voodoo 4 : 1 VSA, Voodoo 5 5000/5500 : 2 VSA, Voodoo 5 6000 : 4 VSA, AAlchemy's custom PCI boards : Fucking-8 VSA on a single oversized card).
"Sufficiently advanced satire is indistinguishable from reality." - [Tips: 1DrYakQDKCQ6y52z6QbnkxHXAocMZJE61o ]
Intel applied concerted effort to kicking its own nuts during the P4 years. If AMD couldn't deliver a knockout blow then, it sure as hell won't now that Intel has a real lineup.
Not the instruction set, but the binary coding of.
Let's say a system encode some instruction in a single byte where the high nibble is the command ( 0Bxh are all moves ), and the lower nibble is the register ( 0x3h are all manipulating register C. Therefore opcode 0B3h moves something into register C ).
You can move this machine from 8 registers to 16. But if you want to extend to 32 registers, you'll have to use 2 bytes to code them ( first bte command, second byte, register. Or same opcodes as before in second byte and first byte being a prefix that says "use the upper 16 registers" ).
Now taking the exemple of x86 to AMD64 extension, maybe moving from 8 to 32 register may have increased the average opcode size from 3 btes to 5 bytes. Wich would have the following disadvantage :
- Same code eats up a lot more memory in 64 modes.
- Less code in cache
- Maybe slower instruction decoding stage in pipeline
- Code start to look a lot different from legacy code (opcodes for 32 bits instructions are not the same binary for equivalent counterparts in 64 bits) wich may eat up more silicon real estate (having 2 different instructions decoders instead of 1 single. For exemple when 32bits was introduced with the 386, most of the opcodes between 16bits mode and 32bits mode where the same, only a prefix was available to specify when code used the "other" size (16bits in 32b code or vice-versa) )
Some ARMs have dual mode opcodes : either opcodes coded on 32bits (4 bytes per opcode) or Thumb mode (2 bytes per opcode). Using the thumb mode, not all functionality is available (not all register are usable for some instruction, only jumps are conditional, not arithmetic). To get all capability, 32bit mode has to be used.
For ARM (the R means RISC), the instruction set is small, the decoding isn't complicated and doesn't eat much space, adding a second "thumb" mode for denser code isn't that hard and doesn't each much space (specially because thumb mode is less complex)
For AMD64s maybe it's not that easy because the x86 opcodes are already a mess (thanks to all the legacy including some dating back the 8080 8-bits predecestror [not binary compatible but at least source-code compatible] ) and implementing alternate completly retought from begining opcodes for 64bits mode code may have used too much silicon (it already nice that they cleaned up the memory model for 64 bits modes)
"Sufficiently advanced satire is indistinguishable from reality." - [Tips: 1DrYakQDKCQ6y52z6QbnkxHXAocMZJE61o ]
But are those costs equal for the 45nm versus 65nm versions?
And then finally, the joe 6-pack user will be able to run it's current work (browsing for porn ?), the ressource-eating microsoft OS, and the huge pile of spambot/spyware/virus/sony rootkit/trojans/etc..., all together, without the main task having to be switched by the multitasking scheduler in favor of the others, and thus for the first time won't observe such a massive slowdown, 2 days after the anti-virus license expired...
"Sufficiently advanced satire is indistinguishable from reality." - [Tips: 1DrYakQDKCQ6y52z6QbnkxHXAocMZJE61o ]
Awful Macro Devices/Awful Technologies Inc will continue to lose to Intel. Heres to hoping that evil business will cease to exist by the end of this year.
Therein lies the humurous part. Intel's "real lineup" is nothing more than a few minor tweaks and the application of modern processes to 10 year old tech. You might be able to whip the buggy harder, add a second horse, but eventually the locomotive just wears you out as it passes you. (Even if it's quarter-scale)
As for delivering a knockout blow with the P4, I wouldn't count my chickens just yet. AMDs offerings still smoke Intel's in the server market, esp anything over 2 CPUs.
The cesspool just got a check and balance.
If all you want is a cheap powerful processor to put into your gaming PC, then yeah, you don't care who dethrones who. But if you work in the industry (and right now I'm writing documentation for an Operton-based HPC system) then the Intel-AMD struggle is very interesting indeed. Every time AMD scores an upset over Intel, the whole marketplace changes.
And while you may be content to buy cheap technology 6 months after its introduced, not everybody who buys hardware has that luxury. If you're spending 6 or 7 figures for a rack full of high-performance computer, then every little twitch in performance or pricing makes a big difference to your bottom line.
The costs don't need to be equal to pay off. Keeping other factors constant, anything less than (65/45)^2 (about 2X) would cost less per CPU.
If 65nm, 45nm or 10mm consumed the same amount of power or were equally susceptible to random errors, then from a consumer standpoint I would agree with you. But that is not the case. The smaller process allows for an equal amount of computing power to be done with less electrical power. For instance lets say the 65nm processor uses 50watts, and the 45nm process uses 45watts. That may not seem like much but lets assume that you now apply that to 1 million pcs that use the proc 24x7 for three years. This results in: Lbs CO2 134,405,700 Tons CO2 67,202 Equivalent in Cars 3,875 Equivalent gallons of gas 6,870,139 Acres of Trees needed to offset 6,109.63 Mature Trees needed to offset 2,986,793 Trees planted to offset 716,830 If you just want to look at what a person would save for running 1 new proc for the same 3 year period it results in: Equivalent gallons of gas 6.87 Mature Trees needed to offset 2.99 So even if you just purchase 1 chip, it will save you about $18 over the 3 years and will be more eco-friendly. In addition lower power chips tend to be higher in reliability, so the average consumer is less likely to have to pay for a repair over the same 3 year period of time which also has a tangible value. So all in all you should care about the new processes.