AMD's Showcases Quad-Core Barcelona CPU
Gr8Apes writes "AMD has showcased their new 65nm Barcelona quad-core CPU. It is labeled a quad-core Opteron, but according to Infoworld's Tom Yeager, is really a redefinition of x86. Each core has a new vector math processing unit (SSE128), separate integer and floating point schedulers, and new nested paging tables (to vastly improve hardware virtualization). According to AMD, the new vector math units alone should improve floating point operation by 80%. Some analysts are skeptical, waiting for benchmarks. Will AMD dethrone Intel again? Only time will tell."
Then this could be a neat little chip.
You mad
Dethroning happens all the time. We're talking about 65nm when we're already gearing up for 45. Wake me up when either company does 1nm processors.
Still waiting on Serviscope_minor to wake up to fucking reality and realize that Jessica Price isn't going to fuck him.
Things have come a long way since the heady days of bit slice processors. The first microcode I wrote was for an XOR operation - I could not think of anything simpler, that would actually do something useful...
Excuse me, but please get off my Pennisetum Clandestinum, eh!
Anyone know what "SSE128" means? SSE registers have been 128 bit from day one.
Zonk, didn't you get the memo? This slashvertisement belongs in the Special advertising section
"Will AMD dethrone Intel again?" Dear AMD, meet Larrabee. http://www.theinquirer.net/default.aspx?article=37 548
AMD might kick Intel in the nuts a little but definitely not dethrone.
or a public club, that have raged of FrreBSD Usenet later ssen in in ratio of 5 to
As long as AMD and Intel continue to chase each other in the x86 market, high end chips become low end in the span of six months. Just keep buying 6 months behind the press releases and you get great processors for next to nothing.
Weaselmancer
rediculous.
"Lets make a Octa-core processor!"
I read the Tom Yager piece and I started drooling so much I shorted out my computer.
I'll just hit the Submit button now and see if I can get this posted to Slashdot before the drool shorts out this computer too.
Am i missing something or am i completly wrong?
Keeping in scientific fact, how much heat has to be generated for 1 MIPS?
The fact is, absolutely none. It has been shown that only the destruction of information via AND and like instructions create entropy (heat). As long as you use only 3 types of gates (pass through, not, xor), you can create a heat-free CPU. Provided we do want to check for bit errors, we could maintain a very low heat via ECC like checking. Estimates on that are 10^8 lower than present.
We could keep 98% of our efficiency of current day chips if we switched to this method.
....is in the eating. Until we see benchmarks everything is just ignorant speculation.
In my own benchmarks (generic C integer and floating point scientific code) I have found that the Core Duo and Core 2 Duo aren't all that quick compared with an AMD64. Clock for clock the AMD64 Opterons we have are about 50% quicker than an equivalent Core 2 Duo for integer work. I know this doesn't agree with all the usual magazine benchmarks but they are heavily biased towards using SSE instructions where possible and it is SSE where the Core 2 Duo has been a real improvement over previous Intel designs and also bests the AMD chips. Hopefully, AMD has recognised this and the new SSE implementation will bring them back on par with Intel for these benchmarks but even today an AMD64 processor is a beast and more than a match for anything Intel produces.
"I have the attention span of a strobe lit goldfish, please get to the point quickly!"
come on, amd guys!
get the hardware encryption support from VIA licensed and include it; that will be needed in the next years too!
But will it run Linux?
Tubby or not tubby. Fat is the question
No but a good hard, well aimed, holding nothing back kick in the nuts can leave them impotent,
so they'll have to do some ugly procedures to survive it in the long run. A couple of identical
blows in the meantime could leave them sterile, so if the current setups begin to die out.
And Intel had no more babies waiting anymore, they will not be dethrowned, but will be getting
an hounerable mention in the history books.
If you don't like my sig then don't read it.
Clock speed doesn't mean crap anyways. It's all in the code. I see guitar tuning programs for the computer... TEN megs in size, slow as hell, and inaccurate! I believe APTuner is FAR smaller than most, faster and far more accurate. People just don't know how to code, plus the fastest ways to code are copyrighted, which they shouldn't be since they'd be utterly obvious to any programer with that standard "ordinary" knowledge in that language, so one has to make workarounds that inevitably end up being slower. No more oldskool hacker ethic, now it's greed.
Still waiting on Serviscope_minor to wake up to fucking reality and realize that Jessica Price isn't going to fuck him.
read the article, that is an x86 GPU it wouldn't be able to compete with general purpose CPUs
I dont think I've ever read such an admiring review of a CPU design. Last time I remember a chip sounds so fantastic was the Alpha or something like ten years ago. If a lot of all the new things really work the way they sound in theory, well then yeah I guess it's evident this Barcelona thing is really going to be something else.
The design for VW performance sounds extra interesting
AMD showcases quad core Barcelona CPU? This is old news from last year. The PCWorld article linked to by this story is dated November 30, 2006.
I want a floating quad.
sometimes, nothing.
I'm not kidding. In SSE I'm familiar with, one of the input registers is always an output register, which means its contents are destroyed. Another flaw is that there aren't enough registers... SSE uses 8, where 32 are commonly not enough when latency is longish (especially with SoA-style progamming, where pragmatically a single vec3 occupies 3 128-bit registers).
... or Madd. You know, multiply-add. Does it have that?
I will not surprised if AMD dethrones Intel again. It is a classical Intel vs. AMD battle...
Intel comes up with some hair-brained scheme that "More is better!". (like Viagra) They design something new and decide to make it faster (or in this case just glue more of them together). Back in the day it was the "GHz" now it's all about how many "Cores" you got. This tactic seems to suit Intel quite well and dethrones AMD for about a year and a half... During this time AMD massively redesigns there chips to integrate new, emerging technologies. The gamers and server operators of the world sit by their AMD chips knowing that they might not have the fastest chips for the time being but they are more technologically advanced.
Intel keeps cranking out their "Viagra" chips that become hotter and bigger energy hogs. When they finally come to the realization that their product SUCKS (as with the P4's) AMD is already a step ahead and swiftly takes the market back with their chips, that have always be a step ahead technologically. Intel scrambles, does a redesign, fails, tries again and history repeats. Intel's "Viagra" mentality causes their chips to be more expensive while AMD's "slow and steady wins the race" mentality allows them to keep their prices down.
--
Get your facts first then distort them as you please -Mark Twain
So now I'll see four penguins at startup!
"Nine times out of ten, starting a fire is not the best way to solve the problem." - my wife
What really interest me is how does it compare with single and double precision calculations. If AMD gets in the range of Itanium performaces will Intel follow and kill their own Itanium by boosting core 2 FP ?
Well, until you show us your source code those numbers are as believable as anything else one might randomly type here...
The AACS key is NOT 0xF606EEFD628B1CA427BEA93A9CA9773F
Look. He's talking about something interesting, and the conversation started on the theoretical limits of computation. Thus being in theory land is not unreasonable. Unless you actually have something interesting to say with some actual data to back you up, quit bashing someone who actually DOES have ideas and knows what the fuck they're talking about.
A few years ago people would've said that it was impossible to pull data from below the noise floor in radio channels, but we do it routinly now. People said that OOP was just theoretical wankery, but now java is (to my dismay) one of the most popular languages aroudn. Same for relativity, but we have satalites. Where do you think processors started? You think someone went out and said "I think I'll make a few micron thick wafer of silicon and get a processor". All of our advancements in science have started out as theoretical concepts, and slowly they've become practiced, then practical, and then commonplace.
To be fair, I suspect that the resulting information from the computation itself carries... well... information. As all good computer scientists know information has a direct corolation to entropy, and in fact this is how we define randomness. How much entropy is a simple function of the information actually obtained (I.E. how acuratly we can predict each bit). Given how closely all of these things are tied I would be surprised if this entropy didn't somehow "count" towards the physical entropy of the system. Still, if it took a long complex computation to get those bits, it's not that much entropy lost to information (like say it's a yes, no answer to a very difficult exponentially difficult problem on a large N). It SEEMS like if you reverse the process it should take energy to do so. But I don't know, it also seems like light shouldn't interfere with light, and dark matter should run into other dark matter, and you shouldn't be able to do computation by repeatedly not looking at bits until you have what you want (I.E. Quantum computation, which we can already do, just only 5 bits at a time). Who knows... maybe this is feasable, unless you've got the physical background to prove him wrong, or make a useful contribution, quit pointlessly spewing shit at people who have useful things to say.
Cool - quad core cpu and my internet connection is still in the dark age.
Yep 800Kbps down. Look like my CPUs are going to sit twidle it thumbs for awhile
while waiting for the data
Is this yet another comparison to computers are like cars?
It depends on what kind of data you are processing. If you are doing 32 bit calculations then you would want to compile your code for 32 bit, assuming your processor can handle it, as most 64 bit CPU can. If you are using 64 bit calculations then of course the 64 bit CPU would out perform the 32 bit as you would have to do additional coding steps to simulate 64 bit on 32 bit architecture, multiple 32 bit operations with bit shifting and the like.
If you took code that was written for 32 bit operations and compiled that code for 64 bit then the results would be very bad. The same issues where present when computers made the move from 16 to 32 bit.
I would assume that the code written in these tests are 32 bit. If so, then when evaluating CPU one would have to take into consideration if they needed 64 bit calculations. Also, before jumping the gun and making a rash decision some research would need to be taken to see if your operating system of choice supports true 64 bit operations. Many of the so called 64 bit operating systems only use the 64 bit CPU to support memory beyond 4GB.
Nick Powers
Encryption: I may not agree with what you say, but I will defend your right to encrypt it...
Can we start rejecting 'scoops' that sound like a radio/TV demolition durby or monster-truck madness advertisement?
v4sw6PU$hw6ln6pr4F$ck 4/6$ma3+6u7LNS$w2m4l7U$i2e4+7en6a2X h
- Each of Barcelona's four cores incorporates a new vector math unit referred to as SSE128
SSE has always been 128bit (the 64bit simd extensions were called MMX). AMD used to funnel the instructions through a 64bit execution unit by splitting the work into two halves, the new core has a full 128bit SSE pipeline so doesn't need split the operations. Nothing new here, just a faster internal implementation. Can this deliver and 80% improvevment in benchmark performance? - quite possibly. Take a look at the Core2 FP perfromance numbers - it also has a full 128bit implementation of SSE.- And separating integer and floating-point schedulers also accelerates this thing called virtualization
Huh. Hardware virtualization affects how the processor handles certain instructions such as priviledged operations. FP instruction execution is unaffected. Virtualized workloads will benefit no more than non-virtualized workloads. Separate issue queues are good but does it specifically benefit virtualization? - no.- Barcelona blacks out power to individual portions of the chip that are idled, from in-core execution units to on-die bus controllers. This hasn't made it into PCs before
...
Intel call this 'intelligent power capability'.http://www.intel.com/technology/magazine/computin
- Barcelona adds Level 3 cache, a newcomer to the x86
Xeons have featured L3 caches for years. http://en.wikipedia.org/wiki/List_of_Intel_Xeon_m- Barcelona is genius, a genuinely new CPU that frees itself entirely of the millstone of the Pentium legacy.
- Barcelona is a new CPU, not a doubling of cores and not extensions strapped on here and there.
Barcelona is an Opteron, with a doubling of cores and some extensions strapped on here and there.I'm not meaning to detract from AMD here - the fact that they have still not had to make any radical changes to the opteron micro-architecture is a testament to the quality of the original design. They are slightly ahead of the game on virtualization - they're going to beat Intel to nested page tables - but other than that this chip is playing catchup. Overall this is going to be a very nice piece of kit to work with. But nothing radical and new here.
G.
The STNG episode is similar to Orwell's 1984 where O'Brien, the torturer, shows four fingers before Winston's face, O'Brien increases the pain until Winston says that he sees five fingers, finally, Winston actually imagines five fingers.
There is no right to feel safe thru security vaudeville at the expense of everyone's freedom, privacy and tax money.
Rumours have it that their next CPU model will be named 'Real Madrid'...
Check out this benchmark from an article on extremetech:= 146710,00.asp7 ,00.asp
http://www.extremetech.com/image_popup/0,1694,iid
The actual article:
http://www.extremetech.com/article2/0,1697,201464
This isn't surprising at all. I believe its been well known that the FX chips crunches numbers faster than the Core 2 duo chips. Notice how the FX-62 almost ties or beats the X6800 in some tests. However, in the benchmarks that matter for most personal computer users, the Core 2 duo is the right choice, blending the right amounts of power in the right places. The FX looks like the marathon sprinter in some areas, but the Core 2 is the triathlete in others.
Is the slashdot story queue this long? Or how much did AMD pay to get this old article reposted to try and keep the last two people who haven't jumped ship to Intel Core 2 Duo around?
Context-switching has long been the weakest design point for x86 in "PCs", especially servers. x86 arch is rooted in single-user, single-threaded, single-context apps. The in-core registers that CPU operations execute directly against have to be swapped out for each context switch. In *nix, that means every time a different process gets a timeslice, it's got to execute two slow copies between registers and at best cache RAM, at worst offchip RAM (over some offchip bus). If the register count is larger than the bus width (even onchip), that's another multiple on that slow cycle. That context-switch overhead can be larger than the timeslice allocated to each process's "turn" in the schedule for lower-latency / higher-response (lower "nice") processes, approaching realtime.
Unix was designed for multiusers, context-switching from the beginning. The chips it's run on coevolved with it. Linux arrived when x86 CPUs ran fast enough that context-switching was OK, but still a big waste compared with, say, MicroVAX multiple register sets. Windows architecture is rooted in the x86 architecture that DOS was designed for, though perhaps Vista has finally lost all of the old design baggage originated in the 8088/8086, but its long history of UI multitasking means it's context-switching all the time, which will gain in speed. The MacOS switch to BSD means it's got lots of power bound up in the context switches that could be released with Barcelona.
So while low-level benchmarks might show something like 80% FPU improvement, the high level (application) performance could improve quite a lot more. Recompiling apps to machine code that exploits more registers without the context-switching penalties could find multiples, especially apps with realtime multimedia that run concurrently with other apps. Intel's hyperthreading already gets past some of these bottlenecks in distributing tasks among multiple cores, but the Barcelona paging tables go even deeper, for likely extra performance (on top of Barcelona's own hyperthreading and new L3 cache).
Aside from the marketing "vapormarks" we'll surely see out of AMD (and their sockpuppets) before it's actually released "midyear", I'm looking forward to seeing how this thing really runs in multitasking apps. I'm expecting "like a greased snake across a griddle".
--
make install -not war
Which part of "2009 timeframe for Larabee" did you fail to understand? Both AMD and Intel have other things in store for 2 years away and it's not clear which is going to be better. AMD seems to be moving in the same direction of many microcores with its ATi GPUs (google for it on inquirer) and currently they have more experienced engineers on it, so it's a bit premature to discount them. Intel on the other hand has to execute a first-gen GPU on par with ATi's. And I'm curious what is nVidia thinking of all this.
Actually, from the important perspective of the difficulty of building a new machine around it, the Intel "dual-dual" core chips really are quad core -- they drop into the same socket as the previous dual core chip, placing four cores into the socket. That certainly helped speed the time to market for the chip.
If you mod me down, I shall become more powerful than you could possibly imagine.
Since when slashdot started using a labeled fanboy's quote to help get eyeballs? Tom Yager has been known to preach his AMD cult for a long long time. He frequently talks in AMD sponsored seminars. In his InfoWorld column, he can't last a week without praising AMD for the next breakthrough or putting clumsy spins on AMD's bad news. Like John Dvorak and Robert Scoble, this guy has zero credibility. Quoting him is like quoting all the telecom analysts when NASDAQ was 5000.
In another thought, if slashdot is playing the whole the good/AMD/Apple/Linux vs. evil/Intel/Microsoft/Windows game, Tom Yager can be one of the official trashtalker.
for multi-core CPUs! Launch a monitor and watch your CPUs. Benchmarking a multi-core CPU with a single-threaded program is really broken. It is really pathetic to see all those single-threaded program run on my Core 2 Duo. I'm using Linux and monitoring the CPUs using gkrellm. It's really sad to see all those single threaded program from the 90's being twice as slow as they should (and four times as slow on the quad-core etc.).
...but does it have an IOMMU?
We're not testing the compiler. IMHO, turning optimization OFF would be a fine idea, or at least unobjectionable.
The only important thing is that the compiler choices and options are fair. Using gcc on the Opteron and icc on the Core Duo would not be fair. Using gcc everywhere, with the same options, it completely fair.
One can also define "fair" as "all systems tweaked to the max", but this is rather difficult to do right. (see also: OS benchmarks, where the benchmarker knows all the ways to tweak the OS he uses most often)
Java is big-endian, like the SPARC and G4.
Java has strictly-defined floating-point math that is incompatible with the x86. An x86 chip must save floating-point options out to memory to force the exponent to be the right size.
JIT/emulation systems in general, including Java, do better with more registers. The G4 has about 6x as many once you exclude registers that are unavailable. (about 5 for x86, but at least 30 for the G4)
The proper fix is to run multiple copies of the benchmark.
I'm using Linux, with single-threaded apps, but so what? I run lots of things at once:
X, window manager, xterm, editor -- that is 4, plus the kernel
X, xterm, tar, gzip -- that is 4, plus the kernel
X, xterm, make, bash, cc1, cc1, cc1, gas, gas, ld... -- that's a lot of things!
The main use of vector units is running crappy Windows gamer "benchmarks" and MacOS Photoshop "benchmarks". The games don't even use the vector units all that much. It's just the benchmarks that use the vector units.
In the real world, vector units aren't good for much at all. You can do radar processing with them, but that isn't exactly a desktop app. Linux can use them for software RAID.
The Larrabree is planned for 2009.
By then, both ATI/AMD and nVidia have plans to introduce their own "multiple chips on single card" family.
The current (for nVidia) / next (for ATI/AMD) line of DirectX 10 cards will be the last of single-super GPU, where different models in the line differs only by the clock speed and the disabled pipe-lines.
From G90 and R700 onward witch will probably appear around 2008-2009 too, the graphic card will look the same way as those multi-chip card pioneered from 3DFX (Voodoo 4 : 1 VSA, Voodoo 5 5000/5500 : 2 VSA, Voodoo 5 6000 : 4 VSA, AAlchemy's custom PCI boards : Fucking-8 VSA on a single oversized card).
"Sufficiently advanced satire is indistinguishable from reality." - [Tips: 1DrYakQDKCQ6y52z6QbnkxHXAocMZJE61o ]
Intel applied concerted effort to kicking its own nuts during the P4 years. If AMD couldn't deliver a knockout blow then, it sure as hell won't now that Intel has a real lineup.
Not the instruction set, but the binary coding of.
Let's say a system encode some instruction in a single byte where the high nibble is the command ( 0Bxh are all moves ), and the lower nibble is the register ( 0x3h are all manipulating register C. Therefore opcode 0B3h moves something into register C ).
You can move this machine from 8 registers to 16. But if you want to extend to 32 registers, you'll have to use 2 bytes to code them ( first bte command, second byte, register. Or same opcodes as before in second byte and first byte being a prefix that says "use the upper 16 registers" ).
Now taking the exemple of x86 to AMD64 extension, maybe moving from 8 to 32 register may have increased the average opcode size from 3 btes to 5 bytes. Wich would have the following disadvantage :
- Same code eats up a lot more memory in 64 modes.
- Less code in cache
- Maybe slower instruction decoding stage in pipeline
- Code start to look a lot different from legacy code (opcodes for 32 bits instructions are not the same binary for equivalent counterparts in 64 bits) wich may eat up more silicon real estate (having 2 different instructions decoders instead of 1 single. For exemple when 32bits was introduced with the 386, most of the opcodes between 16bits mode and 32bits mode where the same, only a prefix was available to specify when code used the "other" size (16bits in 32b code or vice-versa) )
Some ARMs have dual mode opcodes : either opcodes coded on 32bits (4 bytes per opcode) or Thumb mode (2 bytes per opcode). Using the thumb mode, not all functionality is available (not all register are usable for some instruction, only jumps are conditional, not arithmetic). To get all capability, 32bit mode has to be used.
For ARM (the R means RISC), the instruction set is small, the decoding isn't complicated and doesn't eat much space, adding a second "thumb" mode for denser code isn't that hard and doesn't each much space (specially because thumb mode is less complex)
For AMD64s maybe it's not that easy because the x86 opcodes are already a mess (thanks to all the legacy including some dating back the 8080 8-bits predecestror [not binary compatible but at least source-code compatible] ) and implementing alternate completly retought from begining opcodes for 64bits mode code may have used too much silicon (it already nice that they cleaned up the memory model for 64 bits modes)
"Sufficiently advanced satire is indistinguishable from reality." - [Tips: 1DrYakQDKCQ6y52z6QbnkxHXAocMZJE61o ]
And then finally, the joe 6-pack user will be able to run it's current work (browsing for porn ?), the ressource-eating microsoft OS, and the huge pile of spambot/spyware/virus/sony rootkit/trojans/etc..., all together, without the main task having to be switched by the multitasking scheduler in favor of the others, and thus for the first time won't observe such a massive slowdown, 2 days after the anti-virus license expired...
"Sufficiently advanced satire is indistinguishable from reality." - [Tips: 1DrYakQDKCQ6y52z6QbnkxHXAocMZJE61o ]
Awful Macro Devices/Awful Technologies Inc will continue to lose to Intel. Heres to hoping that evil business will cease to exist by the end of this year.
Therein lies the humurous part. Intel's "real lineup" is nothing more than a few minor tweaks and the application of modern processes to 10 year old tech. You might be able to whip the buggy harder, add a second horse, but eventually the locomotive just wears you out as it passes you. (Even if it's quarter-scale)
As for delivering a knockout blow with the P4, I wouldn't count my chickens just yet. AMDs offerings still smoke Intel's in the server market, esp anything over 2 CPUs.
The cesspool just got a check and balance.
If all you want is a cheap powerful processor to put into your gaming PC, then yeah, you don't care who dethrones who. But if you work in the industry (and right now I'm writing documentation for an Operton-based HPC system) then the Intel-AMD struggle is very interesting indeed. Every time AMD scores an upset over Intel, the whole marketplace changes.
And while you may be content to buy cheap technology 6 months after its introduced, not everybody who buys hardware has that luxury. If you're spending 6 or 7 figures for a rack full of high-performance computer, then every little twitch in performance or pricing makes a big difference to your bottom line.