AMD Takes Opteron To 2.4GHz

← Back to Stories (view on slashdot.org)

Posted by timothy on Tuesday May 18, 2004 @11:48PM from the zhoooooommmm dept.

EconolineCrush writes "AMD has added a series of Opteron x50 processors to its workstation and server line that push the K8 core up to 2.4GHz. The Tech Report has tested the latest single and dual-processor Opterons against more than 20 other processors, including exotic Pentim 4 Extreme Edition chips, affordable Athlon 64s, and everything in between. Even if you have no interest in AMD's latest workstation chips, the review is worth checking out to see how two dozen of the fastest workstation and PC processors stack up in rendering, scientific computing, speech recognition, and even gaming tests."

15 of 258 comments (clear)

Min score:

Reason:

Sort:

AMD are back by RoderickMcDougall · 2004-05-18 23:55 · Score: 5, Interesting

They were lagging there for a while but the benchmarks depict a good story. Looks like the opteron is going to be yet another AMD chip that is great for gaming (and most other things). Hopefully a cheaper price than the p4's will really contribute to yet another dominating year for AMD.
Re:The Conclusion... by ishark · 2004-05-19 00:02 · Score: 2, Interesting

Thanks for the summary. I quickly browsed the article looking at some graphs, and I'm suprised by the bad performance of the Athlon 64s compared to the Athlon XPs in many of the tests.... Is any of those programs running in 64 bit mode, or it's just a test of 32-bit applications running on 64-bit CPUS?
I don't think it is puzzling at all by peterdaly · 2004-05-19 00:04 · Score: 5, Interesting

Intel's apparent willingness to forego such enhancements in favor of adding ever-larger on-chip caches to the Xeon is puzzling"

Why is it puzzling? In their historic "Intel Inside" world, they were basically competing against themselves. Adding a bigger cache is not only easy, but a cheap way to rake in more cash without doing much R&D work.

It's not until recently that AMD has starting "schooling them" on what improvement really means. Just look at how Intel is going to use the AMD x86-64 method in the upcoming Intel 64bit platform. And now "If I were building (or, implausibly perhaps, buying) my ultimate workstation right now, I'd want a pair of Opteron 250s beating at the heart of it. The benchmarks speak volumes. For single-processor systems, the Opteron 150 looks like the fastest x86 CPU on the planet..." And this is at much lower mhz!

I believe Intel had thought they had reached monopoly status, which really they had, and the culture had become complacent. This did not happen at the underdog AMD, who has recently been able to quickly leapfrog Intel's offerings.

-Pete

--
Soccer Goal Plans
1. Re:I don't think it is puzzling at all by ivan256 · 2004-05-19 02:06 · Score: 2, Interesting
  
  The sentence before your quoted line there speaks volumes to the issue.
  
  Surely a pair of Xeons on shared bus ought to have this same advantage.
  
  It's way easier to ramp up the bus speed for a single processor, since it only has to interact with one other device. It's considerably harder to increase the speed when there are three devices on the bus instead of only two. Since the Opteron uses point to point connections they don't have this same problem. In that sense it's not really puzzling at all. They probably just can't get it to work.
2. Re:I don't think it is puzzling at all by Michael+Spencer+Jr. · 2004-05-19 14:40 · Score: 2, Interesting
  
  I hate to sound like I'm being contrary, but I don't really know enough about the subject:
  
  You are forgetting a key deficiency of the P4 "netburst" architecture. Its incredibly long pipeline which makes it very susceptable to cache misses. Therefore the larger the L2 cache the less of a performance hit the processor will take if it misses an instruction or two.
  
  I just finished a Computer Architecture class at the local university. While I'll probably forget 90% of what we learned in that class in another year, I'll ask while it's fresh on my mind:
  
  What does a long pipeline have to do with the cache hit/miss ratio?
  
  We learned about some hypothetical five-stage-pipeline CPU in class, which is childs' play compared to the superpipelined monsters of today. However, if the same concepts still hold, a longer pipeline just increases the stall penalty.
  
  For those who haven't yet had their heads pumped full of Computer Architecture trivia, I'll recap what little I learned in class, so the question makes sense:
  
  A CPU is like a big assembly line. Its job is to read a bunch of instructions and execute them in order as they come down the assembly line. In an ideal world, with a program that never loops and never branches, it works JUST LIKE an assembly line, munching on instruction after instruction.
  
  CPUs operate at a clock speed, and receive a clock pulse at regular intervals. They are supposed to be able to complete whatever work they're working on each clock cycle, so a really simple one-stage CPU would need to have a clock speed slow enough that any instruction can be completed in that length of time.
  
  People figured out that instructions can be split into little pieces, such that these little pieces are each simpler than the whole instruction. That lets them build faster but more complex pipelined CPUs. Each pipeline stage might have some work to work on, but all pipeline stages can work at the same time.
  
  So this means that if the pipeline is full of instructions, and every instruction uses every stage, then the CPU is performing one instruction per clock cycle. This is better than before though: these clock cycles are tiny because they just have to be big enough for these tiny fractions of an instruction. So we get the speed benefit of quick clockcycles, but we're still performing a full instruction each cycle. That's something like a 5x speedup if you have five pipeline stages!
  
  It doesn't always really work this way though. See, instructions can depend on each other, and that causes problems. There can be dependencies like Read-After-Write, where instruction 100 does some math and puts the result in a register, like A, and instruction 101 uses the result (in A) in its own calculation. Normally that would be fine, but a pipelined CPU tries to do things at the same time.
  
  For CPUs as simple as we talked about in class, there are two solutions: stalling and forwarding.
  
  In forwarding, the CPU is smart and looks ahead and figures out "this instruction needs something that the previous instruction is providing", and just short-circuits the whole formal writing and reading process, and just kinda passes the answer under the table between pipeline stages. "Psst, hey M-stage, this is E-stage. I've got the answer to A if you want it..."
  
  In stalling, the CPU realizes it NEEDS the answer to one instruction before it can do the next, so it starts wasting work units. Stages start getting commands saying "do nothing", which wastes CPU cycles. So in the example above, where instruction 101 needs something instruction 100 is still creating...suppose instruction 100 is multiplying two 256-bit floating point numbers together. Instruction 100 is going to take TONS of time to finish, so instruction 101 just gets stuck at the decode pipeline stage, sitting there tapping its feet and executing an "are we there yet" check every clock cycle. The rest of the pipeline goes unused.
  
  For my next trick, I'll tie
Ho Hum by A.+Pizmo+Clam · 2004-05-19 00:09 · Score: 0, Interesting

Another day, another coat of more-Gigahertz paint slapped on the crude and aging x86 architecture. Open Source OS's like Linux have allowed us to finally take advantage of the bleeding-edge software technology of the 70's, yet the Intel/AMD duopoly keeps us stuck in even more primitive waters for hardware.

Perhaps, if Microsoft's software monopoly ever gets seriously challenged, we'll finally have a chance to take this register-starved, CISC-mired turkey out back and give it a proper burial.

--

Thank you for your support.
1. Re:Ho Hum by Anonymous Coward · 2004-05-19 01:28 · Score: 1, Interesting
  
  > x86-64 only doubles the number of registers.
  
  Yes, and modern compilers make quite good use of what they have. Why build/buy more when the returns diminish so rapidly?
  
  > Something tells me if the billions of dollars per year in R&D were spent on a fully-RISC system, externally and internally, it would be much faster, saving a stage or two of decoding and other internal mangagement, saving a lot of design and testing hassles.
  
  RISC requires higher memory bandwidth, and more memory, to hold the small function instructions. Modern "CISC" cpus are RISC inside - they load a small "powerful" instruction and execute multiple RISC instructions within themselves - without loading all those instruction words over the memory bus.
  
  > For over half a decade, DEC held its own against Intel with $70M / year CPU development budget, when Intel was spending $2B. They only got tripped up with poor marketing and problems and delays in fabbing the EV6 and EV7.
  
  DEC had a radically different cultural ethic than Intel. While modern American Corporate Management would like to delude themselves otherwise, poor corporate ethics drive costs higher in many ways.
  
  > For one, being fully RISC made it far easier to validate the chip design because it didn't involve lots of work disassembling instructions and keeping track of the results, predicting properly and so on.
  
  These days CPUs are basic software design. Build and test a risk core. Then, build and test an interpreter. Layers my boy, layers. A technique as old, and testable, as the hills.
2. Re:Ho Hum by flex941 · 2004-05-19 03:14 · Score: 2, Interesting
  
  Is it possible for Intel/AMD to make those chips so I can turn off the x86 emulation crap and use internal RISC directly ... so everyone could slowly migrate away from x86 and CISC?
AMD's Cool 'n Quiet by niko9 · 2004-05-19 00:20 · Score: 4, Interesting

...the Opteron 150 looks like the fastest x86 CPU on the planet.

I know I might be nitpicking here, but I really wish the Opteron series of chips incorporated AMD's Cool 'n Quite technology.
From what I read on their website, with a supporting motherboard and driver (2.6.5 has a native driver) the Athlon 64 can scale down to 800Mhz, cool enough for the system to shut the HSF and case fans completely offoff.

One demo I saw online had a Athlon 64 SFF computer playing a DVD while the AMD cool 'n quite app was shoing the the CPU at 80hz and the system was totally silent.

Coudn't server rooms benfit from the reduced electricuty bill also?
Re:The Conclusion... by Jeff+DeMaagd · 2004-05-19 00:24 · Score: 4, Interesting

The reason is that a three-drop bus used for Xeon DP (533MHz bus), five-drop for Xeon MP (400MHz bus), can't operate as fast signalling-wise as a point-to-point bus used for Pentium 4 and all Athlon systems, 1 and 2 processor. Terminmation was just too difficult, I think. Before Hypertransport, the wiring for multiprocessing with only a point-to-point bus was rediculously expensive, particularly on the chip that connects the CPUs to the rest of the system.

AMD got a little unconventional and this time it paid off on Opteron. It didn't work so well with the Athlon MP because of this wiring problem, too many wires, too expensive of a core chip, it was 1000+ pins when 600 pins was thought to be expensive.
Re:Waste of time... by phoxix · 2004-05-19 00:57 · Score: 2, Interesting

Can somebody please benchmark a dual AMD opteron against a dual PPC 970 (MAC G5)

Not so fast, a significant problem in such a comparison is that gcc has *much* better support for x86-64 than it does for PPC64. If there was even a chance that a dual PPC970 machine was faster than a dual x86-64 machine, the likes of Yellow Dog, and Momentum Computer would have been all over it.

Sunny Dubey
Question about itanium2 - Opteron by cazzazullu · 2004-05-19 01:30 · Score: 2, Interesting

Here at the physics lab we are doing research about neural networks. This involves simulations that require a lot of memory and cpu-cycles. A problem we have encountered numerous times when building phase-diagrams is that the mathematical routines chrash when we reach critical parametervalues. This is caused by the fact that certain matrices become 'singular'. This problem does not arise however when we use 'double long' formats, or 64-bit floats, because these are way more precise and still can go a long way when 32-bit doubles already jump to zero, thus causing the problems.
We have decided to buy/construct a fast 64-bit workstation where we can run our simulations without chrashes. Now my question to you fellow slashdotters is:
The budget is a few thousand euros, not over 10 000 (this is comparable in dollars). What would the best bang-for-the-euro be? Single-Dual? Xeon-Opteron-Itanium2? It must at least contain 4 gig of RAM.
Thanks for your suggestions, looking at several "comparison-websites" has only made us more confused.

--
int main(void) {while(1) fork(); return 0;}
1. Re:Question about itanium2 - Opteron by Spoing · 2004-05-19 04:36 · Score: 4, Interesting
  There's another thing. Opterons are going to become dual-core in less than 2 years, with the same pinout as today. That means that if you have a lowly 2-way server that you're thinking about dumping, you can buy new dual-core Opterons and instantly get a 4-way out of your old 2-way server.
  
  Even if the pinouts stay the same, the system boards you can buy now might not support the processors being sold in 2 years.
  Why upgrade the CPU only in 2 years and skip the other improvements available at that time?
  I have very infrequently had a CPU upgrade that was worth it, while updating other components (disk, network, added RAM, video, ...) usually do give a reasonable boost. Most of the time the modest real performance increase from swapping in a new CPU -- one that is bound to the limits of the existing system board-bound -- isn't worth the time or money.
  The only exception I can think of is if you buy behind the bleeding edge and upgrade every 6-9 months to a processor that is substantially better (2x) but not bleading edge.
  To do this properly usually requires getting an advanced system board that can handle the higher end components and then turning around and being cheap on the CPU. While this can be a good idea, it usually isn't and the situation is very specific to the system board.
  IMNSHO:
  
  Always buy what you need today and do not look over 6-9 months in the future for upgrades.
  
  If you expect a payoff in a future upgrade, make sure that the hardware you buy now is also what you need today and do not depend on a future promise. If it works out, HOO-RA! If not, you haven't lost a thing.
  --
  A firewall can not protect you from yourself. Turn off what you do not need. Do not use the firewall to do your work.
Quad Opterons? by ele7ven · 2004-05-19 05:45 · Score: 2, Interesting

Seldom do you get to see the performance of quad opterons in benchmarks. With amd's hypertransport technology, the 800 series decimates even the newest 4mb L3 cache xeons. Perhaps, however, it's that reviewers realize they don't need to show the complete scaling potential of the opteron to make the point that it's a superior workstation cpu.
Intel is competing as best they can (ie., poorly) by branchingfactor · 2004-05-19 07:37 · Score: 3, Interesting

I respectfully disagree that Intel was ever competing with itself. They've been competing with AMD in the desktop/workgroup market for a long time now, and with Sparc/MIPS/Alpha in the enterprise market as well. Intel developed the high-clock rate Pentium 4 to compete directly with AMD's Athlon, after the Athlon whooped the Pentium 3. The Intel marketing people saw how much leverage AMD got from being the first to 1GHz with their Athlon and they didn't want that to happen again. Intel was *severely* embarrassed by loosing the race to 1Ghz. The Intel marketing people incorrectly concluded that the market was buying clock rate rather than performance. So they mandated a CPU that would have the highest possible clock rate, irrespective of performance. That's the P4/Netburst. Now they are getting burned on performance because AMD has shifted the dialog from clock rate to benchmarks. Intel also saw with the success of the Pentium M that benchmarks can triumph over clock rate. So now Intel has finally realized that they misread the market and they have to change their entire product strategy.