Slashdot Mirror


AMD Talks About Internal Benchmarks for Opterons

ggruschow writes "AMD's CTO says their 2.0-Ghz Opteron (aka Hammer) beat a 2.8-Ghz Xeon (P4) on both SPECint2000 and SPECfp2000 tests, but was mixed against an Intel 1-Ghz Itanium 2 (details at ExtremeTech). IBM predicted "conservative" 1.8-Ghz PowerPC 970 scores, which fall in the middle of the pack (sweet for OS X). It's probably not a coincidence that AMD's news comes so soon after Gartner said x86-64 would fail. Even if Intel loses the performance crown again, their upcoming mobile processor is looking pretty spiff with its recently announced 1MB of cache. Sounds like next year might finally bring a worthy upgrade for my 486dx4-160."

20 of 295 comments (clear)

  1. *sigh* by chefren · · Score: 2, Informative

    Who cares what processor is slightly slower or faster than others? You need at least a 10% difference in overall system performance to notice anyway.

    Darn, I missed fp by thinking...

    1. Re:*sigh* by SomeGuyFromCA · · Score: 2, Informative

      The conversion rate is $1 == 1.02

      So it's fairly close.

      --
      if the answer isn't violence, neither is your silence / freedom of expression doesn't make it alright
    2. Re:*sigh* by tconnors · · Score: 5, Informative

      For straight CPU intensive tasks it matters.

      But for 99% of normal peoples taskes 10% whont matter.


      10% never matters. We regularly run simulations here that take a month. What is 10% on top of a month? 3 days. If you have already been waiting 30 days, what does another 3 matter? It probably corresponds to the weekend anyway.....

  2. Re:486 160 mhz? by Anonymous Coward · · Score: 1, Informative

    Search google, AMD did eventually release a 160 dx4. There was also a dx4 133.

  3. Clawhammer by Perdo · · Score: 5, Informative

    Clawhammer (Athlon) has a single 16 bit wide hyper transport bus.

    The workstation Sledgehammer (Opteron) has two 16 bit busses

    The server Sledgehammer (Opteron) has three 16 bit busses

    The spec results are as follows:

    Spec_int

    PIII1G 426
    G4 1ghz 306
    G5 937 (IBM PowerPC 970)
    2.8Ghz p4 1010
    XP 2800 933
    Itanium 1Ghz 810
    Power4 1300 804
    Clawhammer 2.0 Ghz 1202

    Spec_fp

    PII 1Ghz 426
    G4 1Ghz 187
    2.8 Ghz p4 947
    XP 2800 782
    Itanium 1Ghz 1356
    Power4 1300 1169
    Clawhammer 2.0Ghz 1170

    Opteron??? Higher than clawhammer considering the multiple hyper transport busses 1/2 mb L2 (compared to clawhammer's 256/512 l2) and dual on chip DDR memory controllers compared to Clawhammers single memory controller

    Bootleg Powerpoint Presentation:

    http://130.236.229.26/download/misc/AMD-Opteron. pp t

    and

    http://a26.lambo.student.liu.se/download/misc/AM D- Opteron.ppt

    Read the Show notes! AMD failed to edit them out

    Filename is AMD-Opteron.ppt google search it.

    Includes a system that is an Opteron workstation dualed with a clawhammer that still presents itself as a single proc system. The clawhammer acts as a math co-processer :)

    --

    If voting were effective, it would be illegal by now.

  4. Re:486 160 mhz? (History lane) by zensonic · · Score: 5, Informative

    Your konwledge isn't that good. The fastest 486 in terms of Mhz was the Amd 5x86 - 133Mhz (4*33Mhz) chip. That chip easily overclocked to 160Mhz (4*40Mhz). In terms of pentium performance (integer wise) it was equivalent of a P75 at 133Mhz and of a P90 at 160Mhz (give or take a few percent)

    In terms of performance the fastest chip that fitted in a socket 3 was the Cyrix 5x86 120Mhz, which (again speaking of integer performance) was equivalent of a P100.

    --
    Thomas S. Iversen
  5. I hope Hammer will fix the rc5 crippled speed!! by Anonymous Coward · · Score: 5, Informative

    I hope THIS mask rev of Opteron (Hammer) chip will be faster than January 2002 PowerPC G4 chips.

    Currently, according to the RC5 benchmarks AMD is far slower than dual cpu macintoshes (half as fast). (source available for cor rc5 loops for most processors). RC5 was silently completed in June or so but a bug went unnoticed for a couple months, but the contest is over. They measured performance in units of "Mac poerbooks" in their press releases.

    The Mac Dual 1 Ghz g4 is faster than all existing dual AMD motherboards in RC5 benchmark by almost 100%.

    21,129,654 RC5 keyrate for dual 1 Ghz g4 system ! And Now apple sells dual 1.25 Ghz stock which would be even faster.

    A dual 1800+ AMD MP gets only HALF as many as a Mac! 10,807,034 rc5 keys !

    Funny "Mhz myth" there showing itself I guess... Apple now is selling even FASTER machines but with smaller caches and less fast read-write ram (it now uses DDR on newest boxes).

    And the macs are using low power g4 chips meant for microcontroller usages with very little predictive branching and a simple 7 stage RISC pipeline depth. (macs complete many many instructions per cycle though, unlike Pentiums).

    The mac I mentioned uses a 2 MB L3 cache and no AMD MP dual cpu boards I know about have any L3 cache at all, so maybe that is whay some common macs are over twice as fast, its not just altivec meager tweaks to rc5. AMS have similar , but less mazing vector ops.

    Another reason the mac might be over twice as fast as an amd dual mp board is not just the 2MB l3 cache but the fact that mac can read and write to a cold page of memory simulatneously FASTER than any AMD MP designs which are biased for linear access and streaming. Many memory scatter benchmarks show this too. Appels newest DDR-RAM machines might not offer this feature though.

    So basically, will the new Hammer systems be able to get close to speed for RC5 and other crypto tasks as the RISC based Powerpcs?

    I really want to know. And I am so sad to see Slashdot reduced to fanboys modding down anything discussing tech subjects like this as "flames" all the damned time. This post is all informatinve and factual and my reason for asking is genuine.

    http://www.research.ibm.com/journal/rd46-1.html has 5 LARGE technical articles on how the POWER4 chip was designed... in PDF form too. Even if you do not appreciate the Power4 (which apple is using a dual-core version of in many months) you might want to read these PDFs because they are all about chip design.

    They put the floating point on the corners of the chip die to help spread heat, etc. Hundreds of interesting facts and pictures on at that site.

    Top500.org lists Power3 dominating the cluster speeds of the top 500 computer clusters for memory+float speed. Power4 will soon start appearing in that list as well as the "lite" version with only 2 MB of cache instead of 4,6, and 16 MB.

    Plus the new chip apple will start using announced yesterday, will have SIMD "VMX" or Velocity Engine added (Moto calls theirs"altivec").... only 90% of altivecs hundreds of opcodes will be offerred though.

    With Pricewatch showing cheapest 800Mhz Itanium bare cpu at almost 8 THOUSAND dollars, and 3.5 thousand for the old itanium 700 Mhz, it does not take a financial genius to see why apple's workstations are selling so well nowadays.

    1. Re:I hope Hammer will fix the rc5 crippled speed!! by acidblood · · Score: 4, Informative
      I suggest you read the distributed.net Slashnet forum, where I explain why the G4 performs faster than x86 processors. Summarizing:
      • RC5 is completely parallelizable, so you could theoretically do as many simultaneous operations as you have execution units on your processor, as long as there's enough registers to mask memory load latency. Obviously, there's many more registers on PowerPC architectures than on x86.
      • The distributed.net core uses the Altivec SIMD extension on the G4, which has a useless rotate instruction, which serves absolutely no purpose that I know of on anything other than RC5 encryption. So I see Intel's point in not including a rotate instruction in SSE2: bit rotation is a completely useless operation except for RC5. Did I make my point clear enough? However, that makes it difficult to use SSE2, given the limited amount of registers available, coupled with the need to emulate a rotate instruction by means of shifts, ORs and an additional temporary register.

      It must be clear that, if Intel had included an SSE2 rotate op, the P4 would easily beat a G4, not at the same clock speed, but given that a G4 can't scale as well as a P4 it wouldn't matter anyway.

      Hammer can't get any better on RC5 without an instruction set overhaul. Athlons already do pipelined scalar integers rotates in 1 clock cycle, it's impossible to beat that.

      Also, please do not generalize G4's distributed.net RC5 speed to a ``PowerPC superiority in crypto tasks,'' because it makes me want to laugh really hard at your cluelessness. SIMD is completely useless in real-world crypto applications: when you use a cypher in Output Feedback mode, which is how stuff is done in the real world when you're encrypting data instead of trying to break keys, you need to know the output of the last crypto operation to mix in the next operation. It should be obvious that you can't do operations in parallel now, so SIMD becomes useless and the Athlon goes back to being faster than the G4 at the same clock rate, and of course much faster on commercially available speed rates.

      Oh, and the larger cache you mentioned has absolutely ZERO effect over RC5 performance. RC5 memory usage for each key being encrypted/decrypted is:
      • number of bits in key rounded to the next 32-bit multiple (64 bits in RC5-64, 96 bits in RC5-72)
      • number of cyphers round plus one, times 8 bytes (12 rounds in the RSA Secret Key challenge equals 104 bytes)
      • 8 bytes for two temporary variables, which hold the plaintext before encryption and the cyphertext after encryption, or the cyphertext before decryption and the plaintext after decryption.

      As you can see, even if you take into account loop control variables and whatever else, it boils down to less than 150 bytes per key. You could probably fit a 60-wide superscalar core on the P4's measly 8 KB L1 cache.
      --

      Join the NFSNET. Our prime goal is making little numbers out of big ones. http://www.nfsnet.org/

    2. Re:I hope Hammer will fix the rc5 crippled speed!! by randombit · · Score: 3, Informative

      The distributed.net core uses the Altivec SIMD extension on the G4, which has a useless rotate instruction, which serves absolutely no purpose that I know of on anything other than RC5 encryption.

      I'll admit I don't know Altivec too well. But I can pretty much guarantee you that a SIMD rotate instruction would be fairly handly on a reasonable number of crypto algorithms (RC6 and MARS come immediately to mind). Assuming it's doing what I figure it's doing based on your statement.

      BTW, SIMD is useful in some crypto algorithms. In particular, I'm thinking of UMAC16, which was designed to be used with MMX or AltiVec. Yes, it most sitiations it's hard or impossible to run the high-level operations in parallel (though you can with Counter mode and when decrypting CBC -- they can both be done infinitely in parallel). And some algorithms do have operations internally that can be implemented with SIMD (mostly by design).

  6. Pentium 4s have no shared cache. uni-processor by Anonymous Coward · · Score: 2, Informative

    Pentium 4s have no shared cache. uni-processor designs only.

    If you want DUAL cpus, or more, you have to go mac or AMD to get speed per dollar.

    and macs are twice as fast as the fastest AMD for rc5 benchmarks.

    a pentium 4 is a heatwasting joke once you start using 2 or more cpus.

    Apple is only selling dual cpu machines now. And when the dual core Power4 ships in 8 months or less, they mught be offereing 4 cpus economically as a stock product, even if they do not, many 3rd party dual cpu board suppliers for macs exist, such as Sonnet Technologies.

    1. Re:Pentium 4s have no shared cache. uni-processor by Toraz+Chryx · · Score: 2, Informative

      1) Xeon
      2) Macs won't be shipped with POWER4's in them, they'll _probably_ be shipped with PowerPC 970s (which are effective single core Power4's + VMX)

  7. WRONG! RISC "ordinary computers" exist! by Anonymous Coward · · Score: 3, Informative

    WRONG! RISC "ordinary computers" exist!

    You wrote "why don't we find them in our ordinary computers"!

    In fact I am using one as I type this. It was built in 1996 (yes nineteen ninety six) and has a 800 Mhz G4 accelerator in it from Sonnet.

    Its my "internet" machine, I use other RISC machines for programming not wired to any external networks.

    It runs a wonderful version of Microsoft Office at full speed (RISC) and launches MS word in 2 seconds cold. (yes two seconds to flashing cursor).

    no intel emulation needed.

    its called a Macintosh

    millions of macs exist and millions of macs use one or more risc processors and almost no mac people I know ever wnat to emulate a pc running windows EVER if they can help it.

    RC5 and other benchmarks are twice as fast on standard macs than AMD, and Pentium 4s have no multi-cpu board designs...

    If you want to run thousands of high end commercial shrink wrapped products in RISC you can, but only on macintosh. And they run very well in the new Jaguar 10.2 (though faster in 8.6).

    1. Re:WRONG! RISC "ordinary computers" exist! by Corporate+Troll · · Score: 2, Informative
      • Mac's are PC's... PC is just an abbreviation for "Personal Computer". Use x86 instead.
      • They give back to the BSD community. Not everything of course (look, giving Aqua under the BSD license is suicide), but I'm pretty sure that PPC machines (and thus macs) are better supported now by OpenBSD and NetBSD. That is contributing back! Don't forget that they don't even have to because it's the BSD license which damn well allows you to take the code and keep it for yourself.
      • Apple's are now pretty generic computers with standard PCI, standard RAM, only the CPU is different. Back in the day they did tricks with ROM's etc...but those are gone by now.
      • High pricing? eMac: G4 700Mhz, 40Gig HD, 1Gig RAM, NVidia GeForce2 MX, CD-R/RW for 1600Euro. Ehm... I think that's pretty good bang for the buck! And waaaaay prettier than any beige box you can get.
        I personally own an iBook, and a comparable Dell was really about the same price. I agree that the dual G4's are a bit pricy but look at the prices of a nice Dual Proc Dell workstation fully equipped and then we'll talk again. Oh, and then don't forget that Macs last longer.
        Always compare prices of Apple computers to Dells, Compaq's, etc. Don't start with the idea: "I can build something better cheaper", I know that, you know that, but it's a different market.
    2. Re:WRONG! RISC "ordinary computers" exist! by Corporate+Troll · · Score: 2, Informative
      • Excused.... I'm nitpickin on it anyway. I just don't like it that people say that Macs are no personal computers
      • It doesn't make it a good think, but they have been nice and contributed back. This is a major difference in comparision to what Microsoft does.
      • Yes, they sucked. They play nice now. Look at IBM, they sucked in the late eighties, now they rock. Companies change. Perhaps in 10 years we'll all love Mircosoft around here.
      • I don't know where you live... But I just went to the website of Dell, and configured a x86 that is (except for the CPU) equivalent to a Mac with Mac OS X:
        • Dimension 2300 Value
        • Intel® Celeron® processor 1.7Ghz
        • 1024MB 133MHz SDRAM (2x512MB)
        • 40GB ATA-100 Ultra DMA
        • Dell E772 17'' (15,9'' VIS)
        • 48x CD-Burner (CD-RW)
        • Dell stereo speakers 206
        • 10/100MB netwerkkaart
        • 56K V.90 PCI Data/Fax Modem
        • Dell Movie Studio I (IEEE 1394)
        • Microsoft® Windows® XP Home
        • Microsoft® Works 6.0
        • Total Price including VAT: 1706,10 EUR /
        Everything I selected Extra, was fair because the Mac comes standard with it. People just find Macs expensive because Macs come with everything and a kitchen sink. This machine is equivalent to the G4 I described. Yes, except the CPU, yes, I know.
        Perhaps model development is slower, but I don't think it is that much an issue.
  8. Windows XP by droyad · · Score: 5, Informative
    I hear that people are saying it would be difficult to port Windows XP to RISC chips (and new 64bit arch). This infact is not true. In the Windows NT family there are 2 features that make it easy:

    1) It's mostly written in c/c++
    2) The HAL (Harware Abstraction Layer) contains most of the platform specific code. As I understand it the kernel does not actually handle the hardware directly

    Ofcourse I can see it going like this:
    1) Apple, Intel, AMD and Moterola put forward new Chip designs
    2) They ask MS to support it with their OS
    3) MS picks Intel

    --

    $vi any_article_on_iraq
    :s/iraq/microsoft/gi
    :s/Weapons of mass destruction/Windows/gi
    :s/Axis of evil/Redmond/gi
    :s/In this post september 11 climate/Service Pack 1/gi
    :s/Bush/Linux/gi
    :wq

    1. Re:Windows XP by Anonymous+Conrad · · Score: 2, Informative

      MS have been supplying developers (like myself) with 64 bit SDKs for at least 6 months, and migration information (i.e. recommendations for writing portable code) for at least 12 months.

      *Way* longer than that.

      In late 1999, MS shipped a crippled 64-bit compiler in their platform SDK for syntax/portability verification. They began shipping a functional compiler and libraries six to nine months later. My then employers (a network card manufacturer) used to get weekly or fortnightly pre-release builds of Win2k and I'm fairly sure they had Itanium builds up to November 1999 or so - when they just stopped. We didn't have itanium hardware anyhow.

  9. paradigm shift... by john_uy · · Score: 4, Informative

    i think the new release of hammer lines will be very difficult for amd. intel is one step ahead. if you see right now, they are already announcing next generation product lines in all fronts. like banias in cpu, ultra low voltage and integrated chips for small devices, extremely high speed chips for network devices.

    i believe intel has shifted its focus in the battle of the desktop cpus. while amd is just playing catch up, intel now is already looking at what consumers will benefit from. maybe intel has realized that the speed today is an overkill for majority of today's needs. they are just speeding up their chips to keep up with moore's law.

    but look at their products, right now, they are focusing on making things smaller, lightweight, ultra low power consumption, low heat devices, integration. the future is not on desktop computers requiring very high speed cpu but mobile devices such as phones, pda, tablets, etc. intel will be a clear winner (if only i have humongous money so i can buy intel stocks at discount.)

    they have good engineers that produce good results. right now, they are already producing better chipsets for their server product lines, maybe a few years, they will no longer rely on broadcom's serverworks.

    they are also picking up on their storage chips. from all the raid controllers in the market, i hardly see a card that does not have an intel 960 i2o processor or their new ixp processors.

    their network and communication is very dynamic. like introducing 10gigabit products today (even with the downturn of telecoms.) enabling encryption and decription at 10gb/s is no joke. maybe a few years from now, we will see intel as chips in those network gear from cisco, et al.

    they are now focusing on wireless integration. few years from now, capacitors and resistors will be in a silicon chip. it is the future, and they are very lucky to realize that. when the economy recovers, intel will clearly be a winner.

    and for the server, i would want to say this. i believe amd will produce good cpu. but that is just half of the story, amd is not emphasizing any good chipsets/system to come with it including support pci-x at 133mhz with hotplug slots, interleaved memory with chipkill(tm), good server management, good integration.

    (as one who decides what to purchase in a server,) amd must make a lot of effort before i will take them seriously. their cpu is not enough for me to get their system, yet.

    let's just wait and see, but i see that intel will always be a step ahead. now for amd, the challenge is to be at par or even be ahead of intel.

    --
    Live your life each day as if it was your last.
  10. Re:kids nowadays... by nelsonal · · Score: 3, Informative

    Your joke reminds me of the ancient Egyptian symbol for a large number. It was a man with his arms upraised as if saying it's incomprehensible. I think it was used for numbers larger than 1000 or 1000000.

    --
    Degaussing scares the bad magnetism out of the monitor and fills it with good karma.
  11. PPC is not a great example of RISC by be-fan · · Score: 4, Informative

    I hate it whenever Mac-heads point to PPC and show how its such a great example of RISC that runs "all you're programs 2x as fast as the fastest Pentium4!" In all reality, the PowerPC line (not necessarily the POWER line) are very unimpressive. These days, a 1.25 GHz Alpha can still hold its own against a 2.5 GHz P4 in terms of floating point power. Yes, the same Alpha that has been neglected for the last half-decaded whose design has stagnated since the 21264 and whose process technology is antique compared to AMD's and Intel's. But the Alpha still keeps kicking x86 in the head. Yet, the PowerPC, running at the same 1.25 GHz, backed by the dual giants Motorola and IBM, built with leading edge copper fab technology, the second most common desktop RISC architecture (after x86 :) shipping in every single Apple computer isn't even competitive with the P4. Damn you DEC! Damn you to all hell!

    --
    A deep unwavering belief is a sure sign you're missing something...
  12. Re:Everyone, look AWAY from the clock speed. by drinkypoo · · Score: 3, Informative
    If people were doing more threading or planning to actively run more processes at once, then SMP would be more attractive. Unfortunately too few applications make use of multiple processors, and too few operating systems provide relocatable threads.

    P4 hyperthreading will hopefully get people into threading. Athlon will have slick four way and eight way multiprocessing with hammer when it finally rolls out. Halfway to 2003. I'm a student so I won't be buying until it comes out... That's what you get for delaying to add palladium you bastards.

    --
    "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"