Slashdot Mirror


AMD Athlon Multi-Processor Under Linux

An Anonymous Coward writes: "Just saw this review at GamePC. It's a pretty extensive review of AMD's entry into the multiprocessor arena, full of exciting benchmarking results. The full text is here."

19 of 108 comments (clear)

  1. NetBSD runs on SMP Athlons by Anonymous Coward · · Score: 4

    FYI, NetBSD boots and runs on dual-CPU Athlons. You just have to build your system via the (expirimental) nathanw_sa CVS branch. Here's a dmesg posted to the NetBSD tech-smp mailing list:

    http://mail-index.netbsd.org/tech-smp/2001/06/05/0 000.html

  2. Re:Has anyone done a comparison? by Christopher+Thomas · · Score: 4

    Has anyone done a cost-efficiency comparison of dual-cpu performance vs. a simple cpu when considering the costs involved (special SMP boards, etc.) In otherwords is it more economical to buy two web servers or one smp server with tons of ram? Do certain applications (cpu intensive obviously) save money with SMP systems verus others that depend on IO throughput, etc and what applications are those?

    Any task that is easily parallelized and has low internal communications requirements would run more effectively on multiple servers than on one SMP behemoth. Web serving has zero internal communications requirement, and so falls into this category. Things like ray-tracing have low communications requirements when partitioned properly, which is why you use clusters as render farms instead of massively parallel Big Iron.

    SMP has overhead from coherence operations, and more complex and expensive chipsets.

    SMP benefits tasks that lend themselves to shared-memory implementations. It's a lot easier to toss ownership of memory pages back and forth inside an SMP machine than it would be to send modified pages back and forth across a network. I don't have examples of this kind of task offhand, but I'm sure they exist.

    All of this is for CPU-bound tasks. For I/O bound tasks, you're still better off splitting it up into multiple machines if it's easily parallelized, but again I don't have good examples to illustrate with off the top of my head.

    For more information, pick up a couple of good books on parallel computer architecture and parallel programming. Your local university's bookstore will stock these.

  3. Re:Has anyone done a comparison? by Christopher+Thomas · · Score: 4

    Any task that is easily parallelized and has low internal communications requirements would run more effectively on multiple servers than on one SMP behemoth.

    I have several problems with this generalization. First, parallelizing over multiple servers always adds overhead (in both $$$ and performance) of its own. How are you going to spread a load over multiple web servers? You need a load balancer, either the dedicated (pricey) hardware kind or a standard server converted over to load balancing service (which doesn't get you the greatest speed or scalability in the world). Even in a scientific application that you spread over several boxes, you need some kind of load balancer or traffic cop to get an equitable distribution of work.

    You make a valid point, in that load-balancing is an issue. However, I'm assuming that in the case of a web server, if you have enough traffic to need more than one server, you have enough money to buy a hardware load-balancer to spread out requests (and a hardware firewall, if management has any sense).

    As for scientific applications, you need load balancing regardless of whether the processors running the threads are in one box or in several. This is usually handled transparently by the OS, the compliler, the communications library, or a combination of the above (usually all of the above). This is standard for any high-performance computing project, and so doesn't add to your maintenance overhead. It also doesn't contribute substantially to the processor workload, so I don't see it as much of a concern for scientific workloads.

    Second, let's not forget that two-way (and even four-way, if you were in the Xeon market to begin with) boxes have gotten much cheaper in the past year or two. Most of the important server availability features, like hot swap drives, hot swap power supplies, ECC RAM, 64 bit PCI, etc., are almost impossible to find on 1-way systems these days.

    The last time I checked, n-way systems for n > 2 were still far more expensive than n one-way systems, but I haven't checked within the past couple of months. This might have changed, but I doubt it.

    N = 2 was marginal, if I remember correctly.

    ECC RAM support is available on several single-CPU motherboards; check your favourite vendor's site for a list of options (admittedly pricier than most of the boards, but not horribly so).

    I'm assuming that hot swap power supplies aren't relevant. Your load-balancing hardware or (for a cluster) software will be able to detect malfunctioning nodes; this is essential for any cluster of significant size. A supply failing would be no different from any other component failing from a maintenance point of view (bad node is cut out of the loop by the load balancer, the hardware person gets paged, the node is swapped out and the old node serviced or gutted for parts).

    PCI-64 support is a good point. If you have to support PCI-64, then it probably makes sense to build your cluster out of dual-CPU nodes, because the incremental cost of getting a dual-CPU motherboard will be low. Quad-CPU and higher will probably be less economical (quad cost diamonds the last time I checked). You'd only need PCI-64, though, if you either had a very large communications requirement (multiple very fast network cards per node), or if you were mounting a large RAID on the node (many controllers, many strings). In the first case, I can weasel out by claiming that you're outside of my stated problem domain (low communications bandwidth) :). In the second case, you're looking at one of a handful of disk nodes within a much larger system (in all likelihood). For non-disk nodes, you wouldn't need PCI-64. For clusters that distributed disks over many nodes, your I/O bandwidth needs would be adequately served by PCI-32, and PCI-64 again becomes unnecessary.

    It's nice to get an interesting response, though :). You've made me think about the problem in more detail.

  4. Re:Geez. by Black+Parrot · · Score: 3

    > As much as the Athlon people tout their shit as superior, and it took them HOW long to do SMP ?

    What's the ratio of AMD processor recalls : Intel processor recalls ?

    --

    --
    Sheesh, evil *and* a jerk. -- Jade
  5. FreeBSD booting on Athlon SMP for....ages. by WasterDave · · Score: 3
    John Baldwin (@ FreeBSD.org) managed to land himself a dual Athlon board as long ago as April. Apparently it booted 5.0-current first time.

    Highlights of the dmesg for those who like that sort of thing:

    Copyright (c) 1992-2001 The FreeBSD Project.
    Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
    The Regents of the University of California. All rights reserved.
    FreeBSD 5.0-SNAP-20010419 #1: Fri Apr 20 14:59:46 PDT 2001
    root@:/usr/src/sys/compile/GUINESS-smp
    CPU: AMD Athlon(tm) Processor (1194.68-MHz 686-class CPU)
    real memory = 1073741824 (1048576K bytes)
    FreeBSD/SMP: Multiprocessor motherboard
    cpu0 (BSP): apic id: 1, version: 0x00040010, at 0xfee00000
    cpu1 (AP): apic id: 0, version: 0x00040010, at 0xfee00000
    io0 (APIC): apic id: 2, version: 0x00170011, at 0xfec00000

    Whohoo!

    Dave
    --
    I write a blog now, you should be afraid.
  6. Re:Has anyone done a comparison? by JohnZed · · Score: 4
    While you definitely have a good point about applications that lend themselves to multiple-box clusters rather than SMP, I don't think you should make such a blanket statement as:
    Any task that is easily parallelized and has low internal communications requirements would run more effectively on multiple servers than on one SMP behemoth.

    I have several problems with this generalization. First, parallelizing over multiple servers always adds overhead (in both $$$ and performance) of its own. How are you going to spread a load over multiple web servers? You need a load balancer, either the dedicated (pricey) hardware kind or a standard server converted over to load balancing service (which doesn't get you the greatest speed or scalability in the world). Even in a scientific application that you spread over several boxes, you need some kind of load balancer or traffic cop to get an equitable distribution of work.

    Second, let's not forget that two-way (and even four-way, if you were in the Xeon market to begin with) boxes have gotten much cheaper in the past year or two. Most of the important server availability features, like hot swap drives, hot swap power supplies, ECC RAM, 64 bit PCI, etc., are almost impossible to find on 1-way systems these days.

    Finally, there's a huge difference between up-front cost and maintenance costs, with the maintenance usually being more expensive. If you double the amount of rack space you need, double the amount of power you need, and put in the effort to keep both systems perfectly in sync, you'll quickly find that you've blown away that little savings you got at the cash register.
    But, on the other hand, I agree with that this business of benchmarking web servers with like 8 and 12 CPUs (where things really get into a different pricing league) is a bit silly.

  7. A point of comparison: PPC by Mendenhall · · Score: 3

    Not knowing these benchmarks were available, I just spent today compiling 2.4.6-smp for my Dual 500 MHz G4 PowerMac. My complete kernel rebuild time, using 4 jobs, was 3min,10sec, putting it ahead of the Dual-PIII/1GHz but behind the Dual Athlon/1.2GHz. I was very pleased with this speed.

  8. Problems in the review by throx · · Score: 5

    Quote about freezes: "which could have been caused by either 1) nVidia driver problem (which still has a few known SMP bugs still in the latest version) or 2) the AMD 760MP chipset."

    Or a whole slew of other things like cooling, SMP problems in the IDE driver for the 760, plain bad luck that you got the 760MP both times etc. etc. Without actually nailing this down as to what specifically causes the problem you can only make VERY vague guesses about what the problem is.

    Quote from compiling the kernel: "Here, we can definitely see where AMD's superior FPU and number crunching power come into play."

    When did gcc actually use ANY floating point code. Does this guy actually understand what he's benchmarking? All sorts of effects can slow down a compile, from memory bandwidth to I/O bandwidth as well as CPU speed. It was nice to see the Athlon beat the P4, but what CPU was gcc optimised for when IT was compiled (just curious)?

    Quote from MySQL bench: "A real surprise occurred when the single processors faced off. The Athlon not only soundly beat the P3, but actually also managed to beat the dual Athlon by a little over a minute. This does seem a bit odd because going from a single P3 system to a dual P3 system decreased the time buy a good 10 minues. This could be another example of the maturity of Intel's SMP solution versus AMD's."

    It is more likely that the issue is somewhere in the I/O bandwidth chain. SQL tests tend to stress I/O bandwidth more than anything else - I'd be looking at the drivers before claiming that there are issues with the 760MP. Is MySQL multithreaded anyway so it can take advantage of dual CPUs? Most of the tests seem to show that only the OS is getting any advantage from the dual CPUs.

    Quote from Blender: "It is surprising to note, however, that the Athlon, despite running 500 MHz slower than the P4, still managed to render blacksmith.blend at least a tenth of a second faster."

    No, it's not surprising. Even Intel says that x86 floating point code is slow on the P4. If Blender was rewritten to use SSE-2 instructions rather than x86 FPU instructions then I'd almost guarantee a 50% improvement in P4 scores. I'm not defending the P4 here - just saying that the P4 giving cruddy results is not surprising.

    Kudos to the author for the journalistic integrity to correct his error about NT and SMP. Anyone can be wrong - few journalists ever admit it.

    Anyway - those are my thoughts. Debate them as you will.

    --

    Fear: When you see B8 00 4C CD 21 and know what it means

  9. Perfect Post by zpengo · · Score: 3
    Something something Beowulf something something Quake something something cluster something.

    Okay, mod me up as "insightful."

    --


    Got Rhinos?
  10. Re:Older CPU's by malfunct · · Score: 4

    I've seen benchmarks of the T-bird chips done in the tyan board with the 760mp. As far as I've ever read they are pin compatible. I think this is a case of "it will work but we don't support it". There are also some huge benifits of the MP chips vs the t-bird chips when it comes to multiprocessing because of the cache improvements in the MP chips.

    --

    "You can now flame me, I am full of love,"

  11. Server Stats would be nice by peterdaly · · Score: 3

    I would love to see some stats that are more realistic of a server envirnment. I don't know about mySql, but I know postgres spawns more than one thread for addition queries when needed. Looks to me from the numbers that mySql is only doing one thing at a time for the bechmark.

    I know my dual PIII 700 kicks ass when two big queries are going on at once, as long as they are indexed well. I have a hard time believing a single processor would still beat the dual in any useful DB test. (How many DB's really only perform one query at a time?) Two mySql benhmarks run at the same time, and then 4, would be much more interesting to me.

    Correct me if I'm wrong about how the mySql benchmark works.

    -Pete

  12. reason the kernel compile didnt gain from 2 CPUs by benploni · · Score: 5

    The reason the kernel compile didn't gain from > 2 CPUs is that the disk become a bottleneck. The proper way to compile a kernel on a multicpu machine:
    1) change the makefile to run gcc with '-pipe'. Read the man page to see why.
    2) set MAKE=make -jN, where n=num of CPUs
    3) either put the source in a ramdisk or run it on a fast striped raid system.
    4) run make -jN (yes, both the environ and the arg)

    TaDa! Much faster!

  13. Re:Has anyone done a comparison? by Doomdark · · Score: 3
    Although it may be that 2 x Single-Processor system in many cases is better than 1 x Dual-processor, there are some benefits from having a SMP system:
    • You can share most other components; maintainability is better (one monitor, as many HDs as you need, one case, one motherboard even though it's more expensive etc.). And even though it's kind of a "single point of failure", it's no different really from having more systems, if they all have to be up for the service to be available (ie. no redundant backup systems)
    • For closely-coupled processes SMP is faster than UPs talking via Ethernet; most web-servers talk to databases, and direct communication is more efficient than talking via 100mb (or even gig) Ether. And DBs generally scale quite nicely to multiple SMP systems.
    • Bit irrelevant here, but I certainly enjoyed double-PII work station I had few years back... interactive response _is_ much better (on Linux too)
      • So... it's more convenient IMO to have dual/quadruple system than N single-CPU system.

        Most important, though, is what everyone and their donkey has said; it all depends on what you plan to do with your system.

    --
    I like paying taxes. With them I buy civilization -- Oliver Wendell Holmes
  14. Re:TransMeta? by tshak · · Score: 4

    Because TransMeta doesn't make processors to compete on speed, rather battery life and portability.

    --

    There is no longer anything that can be done with computers that is nontrivial and clearly legal. -- Paul Phillips
  15. SMP Quake3 by Freddy_K · · Score: 4
    In response too:
    "Unfortunately, after doing a some testing and analyzing the results, it appears that SMP Quake3 under linux isn't running at 100%. And when I say that, I mean it doesn't run at all. After trying to enable it with a "r_smp 1" command and a restart, I noticed this error message in the console log: "Trying SMP acceleration... failed". Not good. So, off to Google Groups I go to see if anyone else has had any success. After browsing through what seemed like hundreds of message board posts and pages, we were not able to find anyone who had this working successfully If someone knows how to get this working, we'd love to hear about it!"
    I emailed TTimo at id about it and here's what he had to say:
    You were not able to turn on SMP in Quake III Arena linux .. simply because it is not available yet. Id has never released a linux binary of Quake III Arena with SMP support. That's why you get the "trying SMP acceleration .. failed" message. We have in-house binaries though, and it's on the TODO list ... "when it's done"


    TTimo

    --

    Linux Quake III Arena / Quake III: Team Arena

    Id software

  16. Mirror by LoudMusic · · Score: 5
    A mirror

    ~LoudMusic

    --
    No sig for you. YOU GET NO SIG!
  17. Has anyone done a comparison? by ageitgey · · Score: 5
    Has anyone done a cost-efficiency comparison of dual-cpu performance vs. a simple cpu when considering the costs involved (special SMP boards, etc.) In otherwords is it more economical to buy two web servers or one smp server with tons of ram? Do certain applications (cpu intensive obviously) save money with SMP systems verus others that depend on IO throughput, etc and what applications are those? I'm really interested in knowing with better evidence than "well, I think..."

    --
    Uninnovate - Only the finest in engineering.
  18. Athlon MP == Athlon 4 by CtrlPhreak · · Score: 5

    The Athlon MP is actually the same core (palamino) as the upcoming Athlon 4, the only difference is that the athlon MP has been 'certified' by AMD to run in SMP configurations. There is no change in the socket or connections, all AMD socket A processors are compatible with the AMD 760(MP) based boards and will be compatible with the Athlon 4.

    --
    WikiAfterDark.com It's a sex wiki, go now!
  19. Re:(kind of) ontopic by ocbwilg · · Score: 3

    obtw- Intel does not own the Athlon core, only the Alpha EV6 bus it runs on.

    Which AMD already licensed from Compaq/Digital, so they should have rights to it for as long as they need them. This is further encouraged by statements from Compaq offiers that the sale of the Alpha designs to Intel were not an exclusive deal, so it would seem that Compaq/Digital could continue to license the technology to additional companies.

    Say "NO!" to tax money for religious groups.