AMD's Showcases Quad-Core Barcelona CPU
Gr8Apes writes "AMD has showcased their new 65nm Barcelona quad-core CPU. It is labeled a quad-core Opteron, but according to Infoworld's Tom Yeager, is really a redefinition of x86. Each core has a new vector math processing unit (SSE128), separate integer and floating point schedulers, and new nested paging tables (to vastly improve hardware virtualization). According to AMD, the new vector math units alone should improve floating point operation by 80%. Some analysts are skeptical, waiting for benchmarks. Will AMD dethrone Intel again? Only time will tell."
Things have come a long way since the heady days of bit slice processors. The first microcode I wrote was for an XOR operation - I could not think of anything simpler, that would actually do something useful...
Excuse me, but please get off my Pennisetum Clandestinum, eh!
Anyone know what "SSE128" means? SSE registers have been 128 bit from day one.
I don't care if it's 65nm, 45nm or 10mm - that's a completely irrelevant (to me as a user and purchaser) implementation detail. I care about the results - how fast is it for my workloads? How much is it? How much power does it use?
Obsession about process size is sillier than obsession over clock speeds.
If AMD can produce a better performing chip at 65nm, then who the hell cares if Intel - or anyone else - move to a 45nm process?
Advanced users are users too!
"Will AMD dethrone Intel again?" Dear AMD, meet Larrabee. http://www.theinquirer.net/default.aspx?article=37 548
AMD might kick Intel in the nuts a little but definitely not dethrone.
As long as AMD and Intel continue to chase each other in the x86 market, high end chips become low end in the span of six months. Just keep buying 6 months behind the press releases and you get great processors for next to nothing.
Weaselmancer
rediculous.
Am i missing something or am i completly wrong?
... and you can wake me up when the processors themselves are smaller than a square millimeter. I need them for my new thumb-top computer design.
Keeping in scientific fact, how much heat has to be generated for 1 MIPS?
The fact is, absolutely none. It has been shown that only the destruction of information via AND and like instructions create entropy (heat). As long as you use only 3 types of gates (pass through, not, xor), you can create a heat-free CPU. Provided we do want to check for bit errors, we could maintain a very low heat via ECC like checking. Estimates on that are 10^8 lower than present.
We could keep 98% of our efficiency of current day chips if we switched to this method.
....is in the eating. Until we see benchmarks everything is just ignorant speculation.
In my own benchmarks (generic C integer and floating point scientific code) I have found that the Core Duo and Core 2 Duo aren't all that quick compared with an AMD64. Clock for clock the AMD64 Opterons we have are about 50% quicker than an equivalent Core 2 Duo for integer work. I know this doesn't agree with all the usual magazine benchmarks but they are heavily biased towards using SSE instructions where possible and it is SSE where the Core 2 Duo has been a real improvement over previous Intel designs and also bests the AMD chips. Hopefully, AMD has recognised this and the new SSE implementation will bring them back on par with Intel for these benchmarks but even today an AMD64 processor is a beast and more than a match for anything Intel produces.
"I have the attention span of a strobe lit goldfish, please get to the point quickly!"
I would care. POWER CONSUMPTION/EFFICIENCY. If I want a space heater, I'll stick with a 3.4 GHz P4 with HyperThreading. I DON'T WANT ONE. As it is, for what I like doing and for what I want to do, current-gen processors work just fine. I can play my games, make my music, draw shit, upload data, and check out sites like this, while maintaining my bank account, talk with other people, and more, at the same time. I got over the clock speed thing the second I actually owned a G3. Granted, Windows emulation sucked balls, but what was made for that computer, it once again, did what I wanted it to do. Sometimes better, sometimes worse. YMMV. If anyone wants bragging rights of ANY sort, the technology to shrink the die-size is the way to go. Go read a book called "Nanotime," IIRC. THAT is what I'd like to see, minus superdense C4 that's equivalent to a mini tacnuke, and those insane driving laws and shit.
Still waiting on Serviscope_minor to wake up to fucking reality and realize that Jessica Price isn't going to fuck him.
"Lets make a Octa-core processor!"
Oh, here's one. Though it's been out since before Intel had quad-core chips.
But will it run Linux?
Tubby or not tubby. Fat is the question
No but a good hard, well aimed, holding nothing back kick in the nuts can leave them impotent,
so they'll have to do some ugly procedures to survive it in the long run. A couple of identical
blows in the meantime could leave them sterile, so if the current setups begin to die out.
And Intel had no more babies waiting anymore, they will not be dethrowned, but will be getting
an hounerable mention in the history books.
If you don't like my sig then don't read it.
Clock speed doesn't mean crap anyways. It's all in the code. I see guitar tuning programs for the computer... TEN megs in size, slow as hell, and inaccurate! I believe APTuner is FAR smaller than most, faster and far more accurate. People just don't know how to code, plus the fastest ways to code are copyrighted, which they shouldn't be since they'd be utterly obvious to any programer with that standard "ordinary" knowledge in that language, so one has to make workarounds that inevitably end up being slower. No more oldskool hacker ethic, now it's greed.
Still waiting on Serviscope_minor to wake up to fucking reality and realize that Jessica Price isn't going to fuck him.
45nm is not inherently "better" than 65nm any more than 3Ghz is inherently "better" than 1Ghz. A smaller process size is a means to an end, it's not an end in itself.
The end is the delicate balance of improving power / watt while increasing overall performance and keeping the price down. If AMD can deliver a chip that does a better job of that at 65nm than an Intel 45nm one, then the AMD chip is not somehow "worse" than the Intel one just because it doesn't use 45nm. That's just stupid.
I'm not saying AMD can do that, but I think that criticizing them for not being ready for 45nm yet is more than premature.
AMD's actually guilty of the same flawed logic though - their criticism of Intel's 4 core processor being just 2 dual cores stuck together is just as pointless. It doesn't matter what matters is how well the processor meets the requirements of its target market.
Advanced users are users too!
read the article, that is an x86 GPU it wouldn't be able to compete with general purpose CPUs
Three quad cores for the pasty-nerds under the sky,
Seven for the WoW-nerds in their halls of stone,
Nine for Diablo Men doomed to die,
One for the Dark Nerd on his dark throne
In the Land of Silicon where the corporations lie.
One quad core to rule them all, One quad core to find them,
One quad core to bring them all and in the darkness bind them
In the Land of Silicon where the corporations lie.
He paused, and then said in a deep voice,
This is the Master-quad core, the One quad core to rule them all.
Who says news can't come from an advertising section? It's stil a source of information.....
Still waiting on Serviscope_minor to wake up to fucking reality and realize that Jessica Price isn't going to fuck him.
I dont think I've ever read such an admiring review of a CPU design. Last time I remember a chip sounds so fantastic was the Alpha or something like ten years ago. If a lot of all the new things really work the way they sound in theory, well then yeah I guess it's evident this Barcelona thing is really going to be something else.
The design for VW performance sounds extra interesting
I don't care if it's 65nm, 45nm or 10mm - that's a completely irrelevant (to me as a user and purchaser) implementation detail.
This is Slashdot. We care about those details. You can read more about the "super fast, super cool, super cheap!" market speak on the company's official press releases section.
I want a floating quad.
sometimes, nothing.
I'm not kidding. In SSE I'm familiar with, one of the input registers is always an output register, which means its contents are destroyed. Another flaw is that there aren't enough registers... SSE uses 8, where 32 are commonly not enough when latency is longish (especially with SoA-style progamming, where pragmatically a single vec3 occupies 3 128-bit registers).
... or Madd. You know, multiply-add. Does it have that?
So now I'll see four penguins at startup!
"Nine times out of ten, starting a fire is not the best way to solve the problem." - my wife
What really interest me is how does it compare with single and double precision calculations. If AMD gets in the range of Itanium performaces will Intel follow and kill their own Itanium by boosting core 2 FP ?
Feature size has denominated progress (as measure either by raw performance or performance per watt) over an unbroken 30 year period. Do you recall the very passionate debates about RISC vs CISC? Did a RISC design at one feature size ever beat a CISC design at the next shrink? I think not. Design has never mattered anywhere near as much as feature size. Not that you can't get design wrong. But then you can get a shrink wrong, too, and end up with 1% yields. AMD managed briefly to remain competitive with Intel playing a full shrink behind when Intel did that rather stupid marketron-driven face-plant into the thermal wall (against good advice from their Israel team, who later came to the rescue with Core Duo).
With the recent skyrocket of leakage current, the holy grail of feature size is somewhat tarnished, but it still dominates the performance curve. You completely missed the relationship between feature shrinks and the performance crown. If Intel has better process technology than AMD (almost always) and AMD has a better design (most of the time since the Athlon was first launched) and both companies shrink every 18 months following the Moore projection (that unbroken 30 year historical trend) and AMD always shrinks 9 months behind Intel, then the performance crown will pass back and forth exactly as often as either company announces their next product.
So I agree with you: feature size has no importance to the customer who wants performance for their dollar. Except that you can set your clock by it and project ten years into the future effective performance levels of shrinks we haven't even seen yet. Except for that part, yeah, I'm with you.
Well, until you show us your source code those numbers are as believable as anything else one might randomly type here...
The AACS key is NOT 0xF606EEFD628B1CA427BEA93A9CA9773F
Mr. Coward, in case you have not read the article, the conversation actually is about AMD's new processor, which is a real processor. That processor will generate some amount of heat ... real heat, not theoretical heat.
A few years ago people would've ...
What are you referring to? I never mentioned OOP, relativity, or satalites (sic) in my post.
To be fair, I suspect that the resulting information from the computation itself carries... well... information.
You are a coward but at least you are fair, that's good to know. A fair coward is better than a regular, run-of-the-mill coward, at least in my book. Yes, information carries information. Do you want a Nobel prize for that? O perhaps an honorary PhD degree?
All good computer scientists know information has a direct corolation (sic) to entropy
And even better computer scientists know how to spell, or at least use a spell checker.
It seems like you need to listen to your own advice and : quit pointlessly spewing shit at people who have useful things to say.
If AMD can produce a better performing chip at 65nm, then who the hell cares if Intel - or anyone else - move to a 45nm process?
They care. Just moving the chip from 65 nm to 45 nm means you can produce twice as much on the same silicon wafer. Also, if a 65 nm chip performs well, then a 45 nm version of it (with slight modifications of course) will work even better.
Opus: the Swiss army knife of audio codec
Likely Intel has an edge because they are [almost] ready for 45nm process, while AMD is just getting started on 65nm.
But it is interesting to see the two companies approach the problem from different ends. Do you improve the silicon process or do you alter the architecture and instruction set? I bet you the best answer will be to do both.
quad cores that actually share cache would be nice. these double duals kind of suck because architecturally they can never share cache. although AMD and Intel don't have very dual cores that can even share cache with-in themselves. (although I think Intel is releasing one soon?)
“Common sense is not so common.” — Voltaire
You haven't really added much from your original post. Die shrink is an implementation detail, probably something you read that sounded "futuristic"... The real goals are performance, and (usually secondarily) power consumption. Doesn't really matter how they achieve those goals.
I agree with you on one point - I think as with your requirements, the goal for the average non-technical home consumer should be focused more on efficiency than multi-core 64 bit 4MB cache, etc. But not everyone spends 95% of their time browsing the web and typing in their casserole recipes. Try to run a parallel make on a large project and you immediately appreciate a processor with a high clock rate, multiple cores, and huge cache. I don't need to read sci fi to know what I need to do my REAL job.
8 core (two quad core chips in a single package) is already on Intel's internal roadmaps.
(this was anonymous for a reason)
well if your cpu ever gets powerful enough to do some sort of extremely computationally expensive compression (like with fractals or something?). maybe you could squeeze a little bit more out of your slow link?
I like dial-up, nobody can call me (one phone line, disable call waiting), and I really only do IRC and text browsing. Honestly who wants to give the cable company or phone company $50 a month, those bastards are rich enough.
“Common sense is not so common.” — Voltaire
AMD's actually guilty of the same flawed logic though - their criticism of Intel's 4 core processor being just 2 dual cores stuck together is just as pointless. It doesn't matter what matters is how well the processor meets the requirements of its target market.
I don't think it's about that. I mean, since Intel quickly pumped out something which seems like being 4 core cpu which took far shorter than to develop a new quad core design cpu makes them seem to lag behind, so what can you do to explain ? Not much really, besides what they said above, that Intel got quicker because it's not really a quad core design. Everyone will have their own multicore designed cpus sooner or later, so it won't really matter.
I am putting myself to the fullest possible use, which is all I can think that any conscious entity can ever hope to do.
It depends on what kind of data you are processing. If you are doing 32 bit calculations then you would want to compile your code for 32 bit, assuming your processor can handle it, as most 64 bit CPU can. If you are using 64 bit calculations then of course the 64 bit CPU would out perform the 32 bit as you would have to do additional coding steps to simulate 64 bit on 32 bit architecture, multiple 32 bit operations with bit shifting and the like.
If you took code that was written for 32 bit operations and compiled that code for 64 bit then the results would be very bad. The same issues where present when computers made the move from 16 to 32 bit.
I would assume that the code written in these tests are 32 bit. If so, then when evaluating CPU one would have to take into consideration if they needed 64 bit calculations. Also, before jumping the gun and making a rash decision some research would need to be taken to see if your operating system of choice supports true 64 bit operations. Many of the so called 64 bit operating systems only use the 64 bit CPU to support memory beyond 4GB.
Nick Powers
Encryption: I may not agree with what you say, but I will defend your right to encrypt it...
But how much does this really affect the retail price of a cpu? From randomly googling around, it looks like silicon wafer cost only translates to a dollar or two per cpu, so who cares if they can drop this expense by half? Surely other factors would be more important for cost, such as design expense, manufacturing time, and the cost of the fabrication equipment (which would increase if more manufacturing units are needed due to longer manufacturing time).
Power loss and speed seem to be complicated things, and I really doubt this continues to get better as the fabrication process keeps going to infinitesimally small sizes. It seems to be getting hard to keep the electrons confined at these small sizes, and that's going to have to start having a dominant negative impact at some point.
The best way of looking at things. I started off Intel and stuck with them up to about 1Ghz, jumped ship stayed with AMD untill my X2 3800, now I'm back to Intel with a Duo 2 6600. We'll see in 1-2 years who'll I'll be with next. The same goes for video cards and soda. Pepsi vs Coke, Nvidia vs ATI. Doesn't matter which ever gives me what I want at the lowest price.
Of course companies spend billions trying to convince you otherwise, but all products are just commodities and are practically the same in the end.
I guess you're still running that 4GHz P4 then?
The cesspool just got a check and balance.
Can we start rejecting 'scoops' that sound like a radio/TV demolition durby or monster-truck madness advertisement?
v4sw6PU$hw6ln6pr4F$ck 4/6$ma3+6u7LNS$w2m4l7U$i2e4+7en6a2X h
- Each of Barcelona's four cores incorporates a new vector math unit referred to as SSE128
SSE has always been 128bit (the 64bit simd extensions were called MMX). AMD used to funnel the instructions through a 64bit execution unit by splitting the work into two halves, the new core has a full 128bit SSE pipeline so doesn't need split the operations. Nothing new here, just a faster internal implementation. Can this deliver and 80% improvevment in benchmark performance? - quite possibly. Take a look at the Core2 FP perfromance numbers - it also has a full 128bit implementation of SSE.- And separating integer and floating-point schedulers also accelerates this thing called virtualization
Huh. Hardware virtualization affects how the processor handles certain instructions such as priviledged operations. FP instruction execution is unaffected. Virtualized workloads will benefit no more than non-virtualized workloads. Separate issue queues are good but does it specifically benefit virtualization? - no.- Barcelona blacks out power to individual portions of the chip that are idled, from in-core execution units to on-die bus controllers. This hasn't made it into PCs before
...
Intel call this 'intelligent power capability'.http://www.intel.com/technology/magazine/computin
- Barcelona adds Level 3 cache, a newcomer to the x86
Xeons have featured L3 caches for years. http://en.wikipedia.org/wiki/List_of_Intel_Xeon_m- Barcelona is genius, a genuinely new CPU that frees itself entirely of the millstone of the Pentium legacy.
- Barcelona is a new CPU, not a doubling of cores and not extensions strapped on here and there.
Barcelona is an Opteron, with a doubling of cores and some extensions strapped on here and there.I'm not meaning to detract from AMD here - the fact that they have still not had to make any radical changes to the opteron micro-architecture is a testament to the quality of the original design. They are slightly ahead of the game on virtualization - they're going to beat Intel to nested page tables - but other than that this chip is playing catchup. Overall this is going to be a very nice piece of kit to work with. But nothing radical and new here.
G.
This is way more than a mere quad core design. I was hoping to impart that. It is actually dropping some of the legacy x86 architecture internally, and adding big-iron features - the nested paging per core will be a huge plus for businesses that run multiple CPU machines with lots of virtualization for instance.
The separated schedulers for floating and integer math allows for more parallelism, another speed up.
The shared L3 and reduced latency L2 caches should put Barcelona ahead of Cloverton's split caches: only 2 cores share a cache, one of the features that made Core 2 Duo's faster was that the cache was shared among all existing processors.
I think the real heart of the matter is going to be that these CPUs will far outshine Intel's best in multi CPU rigs, especially business type rigs. They should be on par for single CPU gaming machines, although I'm going to hedge and say AMD might be a little faster on games that can use more than 1 core. (Almost none at the moment, I know, but they'll start coming out soon)
The cesspool just got a check and balance.
The STNG episode is similar to Orwell's 1984 where O'Brien, the torturer, shows four fingers before Winston's face, O'Brien increases the pain until Winston says that he sees five fingers, finally, Winston actually imagines five fingers.
There is no right to feel safe thru security vaudeville at the expense of everyone's freedom, privacy and tax money.
A quick check reveals that there is more to it than size: IPC, clock rate, wire delay .
CC.
TaijiQuan (Huang, 5 loosenings)
Rumours have it that their next CPU model will be named 'Real Madrid'...
Check out this benchmark from an article on extremetech:= 146710,00.asp7 ,00.asp
http://www.extremetech.com/image_popup/0,1694,iid
The actual article:
http://www.extremetech.com/article2/0,1697,201464
This isn't surprising at all. I believe its been well known that the FX chips crunches numbers faster than the Core 2 duo chips. Notice how the FX-62 almost ties or beats the X6800 in some tests. However, in the benchmarks that matter for most personal computer users, the Core 2 duo is the right choice, blending the right amounts of power in the right places. The FX looks like the marathon sprinter in some areas, but the Core 2 is the triathlete in others.
But AMD is right, it is no quad core, it is a multichip package of two dual cores. So calling it "quad core" is pure marketing speech because INTEL is lagging behind AMD again. They simply don't have a real quad core yet. So they use those multichip processors to hide the fact that they behind and even try to create the illusion that it is AMD which is behind.
Realistically you have to compare an INTEL quad core with an AMD dual processor dual core setup. But that is not fair either, because now the AMD rig has two independent memory systems and so has a big advantage on big data sets. But it also needs much more space. So basically the multichip module from INTEL only saves you space on the mainboard, technically it is a dual processor setup with a bad memory path.
And that could hurt AMDs real quad cores, because if everybody in the marketplace associates quad cores with INTELs multichip module they might severely underestimate the performance AMDs processor (hopefully) has. And that would hurt AMDs sales.
Mr. Coward, in case you have not read the article, the conversation actually is about AMD's new processor, which is a real processor. That processor will generate some amount of heat ... real heat, not theoretical heat.
The conversation might have started out as that, but this thread has gone somewhere else. This is a natural part of any discussion. It does not make it offtopic or irrelevant. Part of the discussion of increasing CPU speeds naturally turns to the increased heat production, and he quite reasonably pointed out some research which, albeit theoretical at this stage, may become more relevant when heat production outstrips cooling capacity for a home computer.
What are you referring to? I never mentioned OOP, relativity, or satalites (sic) in my post.
Do you not understand the example of how theoretical ideas sometimes become working, practical items? Because it is relevant to the discussion at hand (particularly in light of you being a tool about it).
You are a coward but at least you are fair, that's good to know. A fair coward is better than a regular, run-of-the-mill coward, at least in my book. Yes, information carries information. Do you want a Nobel prize for that? O perhaps an honorary PhD degree?
Do you have anything useful to say? Because there's nothing in that rather feeble insult that really has any substance. I can't see the relevance that posting AC has to this discussion one way or another. Criticise opinions, not anonymity (or lack of pseudonymity which is not much different anyway). I don't think that his own admission of making a slightly redundant statement really carried any expectation of winning a Nobel prize. Maybe you want a Nobel prize for making up such impotent arguments? No? Then stfu because it's such a pointless non-statement and it doesn't add anything other than to make you come across as a dick.
And even better computer scientists know how to spell, or at least use a spell checker. You only made one typo in your post, but it's still one too many when you are criticising other people's. Poor spelling isn't a particularly good way to get a point across, I will be the first to agree. But who cares? It doesn't add or detract from the underlying content.
Your pompous little diatribe however is a little different - as explored above there is no actual content in it other than you having a little wank over your nerdy agenda.
This idea was invented by Shampoo.
The problem I have with performance/watt is that it distorts the true "value" to the system owner. You NEED to break it down, because while power usage is important, the real issue comes down to "is the higher performance WORTH the extra power the chip draws". I personally don't CARE about performance/watt, except when the power draw is excessive, and I believe that is how MOST people will look at it.
Most laptop processors have a higher performance/watt than desktop processors because they are designed with battery life in mind. What people want is a processor that goes faster, but doesn't suck a huge amount of power to get that performance increase. The Pentium 4 got a LOT of flack toward the end with Prescott because the power demand was so far above the benefits that extra power provided. If it were only ten percent more than an Athlon 64 at the time, then no one would have been bothered by it, unless you are talking about a data center where the price for electric power is a very important consideration.
The only reason the whole fab process improvements is even brought up is because Intel is afraid of AMD. Intel has amazing resources when it comes to money and the ability to pay a lot more into their R&D, but in spite of this, AMD was seen as the performance leader before the Core 2 Duo came out, and AMD has the potential to come back and beat Intel again once K8L is released. It goes to show that if you spend some time looking at how to improve the overall system design and how things fit together, performance will go up a LOT.
It's not the cost of the wafer itself. It's the fact that hundreds of fabrication steps have to be applied individually to each wafer. So the cost of each fabrication step is roughly multiplied by the number of wafers you need to process. Fewer wafers allow for lower costs.
Context-switching has long been the weakest design point for x86 in "PCs", especially servers. x86 arch is rooted in single-user, single-threaded, single-context apps. The in-core registers that CPU operations execute directly against have to be swapped out for each context switch. In *nix, that means every time a different process gets a timeslice, it's got to execute two slow copies between registers and at best cache RAM, at worst offchip RAM (over some offchip bus). If the register count is larger than the bus width (even onchip), that's another multiple on that slow cycle. That context-switch overhead can be larger than the timeslice allocated to each process's "turn" in the schedule for lower-latency / higher-response (lower "nice") processes, approaching realtime.
Unix was designed for multiusers, context-switching from the beginning. The chips it's run on coevolved with it. Linux arrived when x86 CPUs ran fast enough that context-switching was OK, but still a big waste compared with, say, MicroVAX multiple register sets. Windows architecture is rooted in the x86 architecture that DOS was designed for, though perhaps Vista has finally lost all of the old design baggage originated in the 8088/8086, but its long history of UI multitasking means it's context-switching all the time, which will gain in speed. The MacOS switch to BSD means it's got lots of power bound up in the context switches that could be released with Barcelona.
So while low-level benchmarks might show something like 80% FPU improvement, the high level (application) performance could improve quite a lot more. Recompiling apps to machine code that exploits more registers without the context-switching penalties could find multiples, especially apps with realtime multimedia that run concurrently with other apps. Intel's hyperthreading already gets past some of these bottlenecks in distributing tasks among multiple cores, but the Barcelona paging tables go even deeper, for likely extra performance (on top of Barcelona's own hyperthreading and new L3 cache).
Aside from the marketing "vapormarks" we'll surely see out of AMD (and their sockpuppets) before it's actually released "midyear", I'm looking forward to seeing how this thing really runs in multitasking apps. I'm expecting "like a greased snake across a griddle".
--
make install -not war
Intel comes up with some hair-brained scheme that "More is better!". (like Viagra) They design something new and decide to make it faster (or in this case just glue more of them together). Back in the day it was the "GHz" now it's all about how many "Cores" you got. This tactic seems to suit Intel quite well and dethrones AMD for about a year and a half... During this time AMD massively redesigns there chips to integrate new, emerging technologies. The gamers and server operators of the world sit by their AMD chips knowing that they might not have the fastest chips for the time being but they are more technologically advanced.
It's so cute when kids who have only been watching x86 CPU development for the last few years try to sound authoritative !
Which part of "2009 timeframe for Larabee" did you fail to understand? Both AMD and Intel have other things in store for 2 years away and it's not clear which is going to be better. AMD seems to be moving in the same direction of many microcores with its ATi GPUs (google for it on inquirer) and currently they have more experienced engineers on it, so it's a bit premature to discount them. Intel on the other hand has to execute a first-gen GPU on par with ATi's. And I'm curious what is nVidia thinking of all this.
"Hey, boss, we need to by another 100 machines to support these validation runs. Or we could buy 80 machines of this other brand which will accomplish the same thing and save a lot of money, but they're just not as technologically advanced!"
Actually, from the important perspective of the difficulty of building a new machine around it, the Intel "dual-dual" core chips really are quad core -- they drop into the same socket as the previous dual core chip, placing four cores into the socket. That certainly helped speed the time to market for the chip.
If you mod me down, I shall become more powerful than you could possibly imagine.
...but does it have an IOMMU?
I don't care if it's 65nm, 45nm or 10mm - that's a completely irrelevant (to me as a user and purchaser) implementation detail. I care about the results - how fast is it for my workloads? How much is it? How much power does it use?
Obsession about process size is sillier than obsession over clock speeds.
If AMD can produce a better performing chip at 65nm, then who the hell cares if Intel - or anyone else - move to a 45nm process?
If you move to a smaller transistor size you get more processors per wafer (Thus it gets much much cheaper) you lose less of your wafer to small defects (Again Cheaper). Moving data around smaller chips is faster( perhaps better for your workload possibly not).
It will also use less power to change transistor state and will use less voltage ( though increased leakage until recently minimized this effect).
Now there are dozens of other characteristics but by saying "we've moved to a lower die size" they are pretty much addressing all of your "simple" concerns.
Ahem you have to see it differently, you can launch way more programs parallely without having a huge impact on performance. Multicores can go a long way, especially if you do vm stuff and java development where you have to juggle threads by the dozends...
I will not surprised if AMD dethrones Intel again. It is a classical Intel vs. AMD battle...
I am not sure Intel ever did beat out AMD.
I went down to Best Buy where the Intel rep was hard peddling a Code Duo 2 machine and compared his $1500 machine to a AMD X2 clearance one for $600. I had nothing to do that day but be a clown, so I went and got a DVD with software on it, and said these are both XP right? Copy the contents to the hard drive and compress it. I am going to measure it. Core Duo 2 results were almost the same at more than twice the prices were less than 1% different. Not only that, I put my hand over the back of the machine to see how warm the exhaust was. AMD was noticeably cooler. So I walked out with an AMD X2.
So while in my less than perfect benchmark and testing, it is an end to end test factoring in everything. I still bought AMD. Nice machine too, runs much cooler than my P4 2.8HT. Certainly a lot faster.
We're not testing the compiler. IMHO, turning optimization OFF would be a fine idea, or at least unobjectionable.
The only important thing is that the compiler choices and options are fair. Using gcc on the Opteron and icc on the Core Duo would not be fair. Using gcc everywhere, with the same options, it completely fair.
One can also define "fair" as "all systems tweaked to the max", but this is rather difficult to do right. (see also: OS benchmarks, where the benchmarker knows all the ways to tweak the OS he uses most often)
Java is big-endian, like the SPARC and G4.
Java has strictly-defined floating-point math that is incompatible with the x86. An x86 chip must save floating-point options out to memory to force the exponent to be the right size.
JIT/emulation systems in general, including Java, do better with more registers. The G4 has about 6x as many once you exclude registers that are unavailable. (about 5 for x86, but at least 30 for the G4)
The proper fix is to run multiple copies of the benchmark.
I'm using Linux, with single-threaded apps, but so what? I run lots of things at once:
X, window manager, xterm, editor -- that is 4, plus the kernel
X, xterm, tar, gzip -- that is 4, plus the kernel
X, xterm, make, bash, cc1, cc1, cc1, gas, gas, ld... -- that's a lot of things!
The main use of vector units is running crappy Windows gamer "benchmarks" and MacOS Photoshop "benchmarks". The games don't even use the vector units all that much. It's just the benchmarks that use the vector units.
In the real world, vector units aren't good for much at all. You can do radar processing with them, but that isn't exactly a desktop app. Linux can use them for software RAID.
You allude to having a little more incite than most on this subject, but let me just say... DUH!. I'm fairly sure that after 4 cores (which Intel can do now) 8 cores will be next, and once they have that under their belt, 16 cores here we come!
Oh, here's one. Though it's been out since before Intel had quad-core chips.
Not to mention its availability as an open-source chip design.
But: it may be 8 core, but it lacks out-of-order execution, and each core can't perform at full speed without 4 threads, so it's not much better in terms of performance than a 32-core processor running at a quarter of its speed. This isn't bad for some workloads, but for others it's a nightmare.
> ...will work even better...
If it works at all... maybe. Meandering capacitances between traces, variable leakage currents and higher susceptibility to electromigration mean that a substantial rework of the masks is required in order to move from 65 to 45nm.
Yes, you can make it faster, and yes you can make it cheaper, but you can't make it as reliable, nor can you make it use a higher proportion of it's power for computationally useful activities -- not without a substantial change in architecture, using error correction, reversible logic, self-timed circuits, &c.
So far Intel's superior process tech and AMDs superior architecture have passed the performance crown back and forth for several years, but it remains to be seen how long this condition will remain cyclically stable. I rather expect architecture to trump process in the next iteration, as AMD's partnership with IBM is likely to bring process nearer to parity, while Intel's architecture group is simply not competitive -- and revamping their staffing enough to make the kind of revolutionary change needed to recapture the lead would be too risky.
-I like my women like I like my tea: green-
The Larrabree is planned for 2009.
By then, both ATI/AMD and nVidia have plans to introduce their own "multiple chips on single card" family.
The current (for nVidia) / next (for ATI/AMD) line of DirectX 10 cards will be the last of single-super GPU, where different models in the line differs only by the clock speed and the disabled pipe-lines.
From G90 and R700 onward witch will probably appear around 2008-2009 too, the graphic card will look the same way as those multi-chip card pioneered from 3DFX (Voodoo 4 : 1 VSA, Voodoo 5 5000/5500 : 2 VSA, Voodoo 5 6000 : 4 VSA, AAlchemy's custom PCI boards : Fucking-8 VSA on a single oversized card).
"Sufficiently advanced satire is indistinguishable from reality." - [Tips: 1DrYakQDKCQ6y52z6QbnkxHXAocMZJE61o ]
Until they hit that wonderful FSB wall....
I know they're working on a solution to that.
The cesspool just got a check and balance.
Intel applied concerted effort to kicking its own nuts during the P4 years. If AMD couldn't deliver a knockout blow then, it sure as hell won't now that Intel has a real lineup.
Not the instruction set, but the binary coding of.
Let's say a system encode some instruction in a single byte where the high nibble is the command ( 0Bxh are all moves ), and the lower nibble is the register ( 0x3h are all manipulating register C. Therefore opcode 0B3h moves something into register C ).
You can move this machine from 8 registers to 16. But if you want to extend to 32 registers, you'll have to use 2 bytes to code them ( first bte command, second byte, register. Or same opcodes as before in second byte and first byte being a prefix that says "use the upper 16 registers" ).
Now taking the exemple of x86 to AMD64 extension, maybe moving from 8 to 32 register may have increased the average opcode size from 3 btes to 5 bytes. Wich would have the following disadvantage :
- Same code eats up a lot more memory in 64 modes.
- Less code in cache
- Maybe slower instruction decoding stage in pipeline
- Code start to look a lot different from legacy code (opcodes for 32 bits instructions are not the same binary for equivalent counterparts in 64 bits) wich may eat up more silicon real estate (having 2 different instructions decoders instead of 1 single. For exemple when 32bits was introduced with the 386, most of the opcodes between 16bits mode and 32bits mode where the same, only a prefix was available to specify when code used the "other" size (16bits in 32b code or vice-versa) )
Some ARMs have dual mode opcodes : either opcodes coded on 32bits (4 bytes per opcode) or Thumb mode (2 bytes per opcode). Using the thumb mode, not all functionality is available (not all register are usable for some instruction, only jumps are conditional, not arithmetic). To get all capability, 32bit mode has to be used.
For ARM (the R means RISC), the instruction set is small, the decoding isn't complicated and doesn't eat much space, adding a second "thumb" mode for denser code isn't that hard and doesn't each much space (specially because thumb mode is less complex)
For AMD64s maybe it's not that easy because the x86 opcodes are already a mess (thanks to all the legacy including some dating back the 8080 8-bits predecestror [not binary compatible but at least source-code compatible] ) and implementing alternate completly retought from begining opcodes for 64bits mode code may have used too much silicon (it already nice that they cleaned up the memory model for 64 bits modes)
"Sufficiently advanced satire is indistinguishable from reality." - [Tips: 1DrYakQDKCQ6y52z6QbnkxHXAocMZJE61o ]
But are those costs equal for the 45nm versus 65nm versions?
And then finally, the joe 6-pack user will be able to run it's current work (browsing for porn ?), the ressource-eating microsoft OS, and the huge pile of spambot/spyware/virus/sony rootkit/trojans/etc..., all together, without the main task having to be switched by the multitasking scheduler in favor of the others, and thus for the first time won't observe such a massive slowdown, 2 days after the anti-virus license expired...
"Sufficiently advanced satire is indistinguishable from reality." - [Tips: 1DrYakQDKCQ6y52z6QbnkxHXAocMZJE61o ]
Therein lies the humurous part. Intel's "real lineup" is nothing more than a few minor tweaks and the application of modern processes to 10 year old tech. You might be able to whip the buggy harder, add a second horse, but eventually the locomotive just wears you out as it passes you. (Even if it's quarter-scale)
As for delivering a knockout blow with the P4, I wouldn't count my chickens just yet. AMDs offerings still smoke Intel's in the server market, esp anything over 2 CPUs.
The cesspool just got a check and balance.
If all you want is a cheap powerful processor to put into your gaming PC, then yeah, you don't care who dethrones who. But if you work in the industry (and right now I'm writing documentation for an Operton-based HPC system) then the Intel-AMD struggle is very interesting indeed. Every time AMD scores an upset over Intel, the whole marketplace changes.
And while you may be content to buy cheap technology 6 months after its introduced, not everybody who buys hardware has that luxury. If you're spending 6 or 7 figures for a rack full of high-performance computer, then every little twitch in performance or pricing makes a big difference to your bottom line.
The costs don't need to be equal to pay off. Keeping other factors constant, anything less than (65/45)^2 (about 2X) would cost less per CPU.
If 65nm, 45nm or 10mm consumed the same amount of power or were equally susceptible to random errors, then from a consumer standpoint I would agree with you. But that is not the case. The smaller process allows for an equal amount of computing power to be done with less electrical power. For instance lets say the 65nm processor uses 50watts, and the 45nm process uses 45watts. That may not seem like much but lets assume that you now apply that to 1 million pcs that use the proc 24x7 for three years. This results in: Lbs CO2 134,405,700 Tons CO2 67,202 Equivalent in Cars 3,875 Equivalent gallons of gas 6,870,139 Acres of Trees needed to offset 6,109.63 Mature Trees needed to offset 2,986,793 Trees planted to offset 716,830 If you just want to look at what a person would save for running 1 new proc for the same 3 year period it results in: Equivalent gallons of gas 6.87 Mature Trees needed to offset 2.99 So even if you just purchase 1 chip, it will save you about $18 over the 3 years and will be more eco-friendly. In addition lower power chips tend to be higher in reliability, so the average consumer is less likely to have to pay for a repair over the same 3 year period of time which also has a tangible value. So all in all you should care about the new processes.