Slashdot Mirror


AMD Says Barcelona Will Outperform Clovertown

Dysfnctnl85 points out a ZDNet Blog posting in which AMD claims that its upcoming quad-core "Barcelona" chipset should be 40% faster than "Clovertown," Intel's quad-core Xeon 5300 line. AMD says that the introduction of Barcelona marks a shift in their strategy from emphasizing price to performance. The post goes on: "Intel is eager to claw back some of the server market share from AMD, and this is where Clovertown comes in... The Xeon 5300 line will represent excellent value for money since Intel plans on pricing them the same as its dual core Xeon 5100 processors. That could make things tough for AMD."

14 of 153 comments (clear)

  1. If only I/O speeds could also grow as fast by namityadav · · Score: 4, Insightful

    The way AMD and Intel are improving the processor speed is very impressive. I/O speed is going to become an even clearer bottleneck now.

    1. Re:If only I/O speeds could also grow as fast by slamb · · Score: 5, Insightful

      You could turn it around and say that, since the disks are not using their full bandwidth, the disks spend most of their time waiting for requests.

      Only by specious reasoning. I'll disprove by counterexample. If I continously tell the disk to seek to one extreme and read a cacheful, then seek to the other extreme and read a cacheful, it will neither be waiting for requests nor using its full bandwidth. A and not B disproves (A => B).

      Latency and throughput are unrelated only if there can be infinitely many requests produced and satisfied in parallel. In the case of a hard disk, there can be only one active request per head because it can only be at one place at once. Let's consider the example of my laptop hard drive. It's rated at a data transfer rate of 150 MB/s. But look at the seek speeds - 1.5ms minimum, 12ms average read, 22 ms maximum. It can read a 1 MB file in 6.7 ms, but if that 1 MB file is fragmented into ten chunks across the drive, it'll take around 130 ms.[*] So in this case it actually transfers at 5% of its rated speed. And depending on the application, the data may be in many, many tiny chunks.

      That being said, disk latency is one of the major causes of poor performance. But "bottlenecks" only have to do with throughput.

      Latency limits throughput. The requestee usually can only satisfy a limited number of requests at once (see above), and the requestor may not be able to produce the next request until it's received the previous response.

      Simple example: I'm performing a binary search. I need to see what's at location mid before I know if I'll next be interested in location (low+mid)/2 or location (mid+high)/2. In some cases, I can do a speculative fetch for both locations, but you can only extend that out so many generations before you've used up most of your bandwidth on data you'll never use.

      Processors are smart about re-ordering instructions to keep working while they're waiting for stuff to happen, but still they frequently get to a point where they can't execute anything more because of ordering constraints - the results of some instruction are dependent on a previous instruction that hasn't completed yet because it's waiting for a value from memory. That value can be the actual instruction to be executed or an operand...either way, your shiny new processor's stuck doing nothing.

      [*] - It might beat the average if it's smart about ordering. At the very least, 22 ms has to get added if one request is at one extreme and one request is at the other extreme. That brings it down to 23% of the rated speed.

    2. Re:If only I/O speeds could also grow as fast by Mad+Merlin · · Score: 4, Informative

      Let's consider the example of my laptop hard drive. It's rated at a data transfer rate of 150 MB/s.

      A SATA 1 interface can transfer at a maximum of 150 megs/s, but your hard drive can't. On sequential reads, you're unlikely to see much higher than 40 megs/s, even 7200 RPM desktop drives don't exceed 70 megs/s yet.

  2. 40 % faster, after a nap by Original+Replica · · Score: 5, Funny

    Everyone know that in Barcelona they take Ciesta. So don't plan on using you computer between noon and 1.

    --
    We are all just people.
    1. Re:40 % faster, after a nap by macadamia_harold · · Score: 5, Funny

      Everyone know that in Barcelona they take Ciesta. So don't plan on using you computer between noon and 1.

      Yeah, but if it's 40% faster, an hour-long siesta should only 35 minutes.

  3. upcoming chipset? by aczisny · · Score: 4, Insightful

    FTFS:

    "Barcelona" chipset should be 40% faster than "Clovertown,"

    You'd think since the blog got right that Barcelona is the upcoming processor from AMD, and since Clovertown is a processor codename from Intel, that the summary could have gotten it right too. Do submitters not read the articles either anymore?

    --
    Now, landing thrusters.. landing thrusters, hmm. Now if I were a landing thruster, which one of these would I be?
  4. The bounty of true competition by j.+andrew+rogers · · Score: 4, Insightful

    Whether AMD or Intel is producing the fastest, cheapest, most scalable, or most efficient processor at the moment is not terribly important.

    What *is* important is that when you have two companies in genuine fierce competition at the bleeding edge of technology and performance, they extract an impressive amount of productivity and effort out of their engineering and science assets. Free markets are at their best when all the major players have a healthy fear of the capabilities of their competitors.

  5. Re:Linux on quad-core by be-fan · · Score: 4, Funny

    Linux is a preemptible kernel.

    --
    A deep unwavering belief is a sure sign you're missing something...
  6. Re:Don't believe this by be-fan · · Score: 4, Insightful

    All Intel has to do is turn up the clock the day before Barcelona ships. We already know that the Core 2 Duo chips are very overclockable, and getting another 40% -- or even 50%+ out of them -- shouldn't be a problem.

    The performance a chip can get with overclocking is way higher than what the manufacturer can deliver in final products. They have to be highly reliable at their specified clockspeed with (relatively) poor cooling, and while meeting the given voltage and thermal dissipation specifications. I've seen the Core 2 over-clock to 3.5 GHz (with conventional cooling) online, but how many of those are doing it at the stock Vcore while staying within the 65 watt TDP?

    --
    A deep unwavering belief is a sure sign you're missing something...
  7. Re:Don't believe this by TheThiefMaster · · Score: 5, Insightful

    We are talking a SERVER line of cpus here. EE chips are a desktop cpu brand.

    For servers TDP is incredibly important, because server rooms are air-conditioned, a room full of higher TDP cpus costs much much much more to run from an electricity point of view.

    That's not to say that they won't overstep their vcore or TDP limits to get the upper hand on performance, but that wouldn't win them the performance/watt ratio crown that's the all-important stat for server cpus.

  8. Mainly in FP by Visaris · · Score: 4, Informative

    This 40% faster than Clovertown claim is only referring to FP code. The integer side is not nearly as clear. Expect AMD to improve integer performance over K8, but I don't expect any miracles. Here is a small list of improvements Barcelona will have over K8:

    - Double L1 cache bandwidth
    - Double FP units
    - Single-cycle SSE (vs K8's 2-cycle)
    - More fast-path decoding
    - Double TLB size
    - Independent DDR channels
    - More cache (L3)
    - Out-of-Order loads
    - New instructions (LZCNT, POPCNT, EXTRQ/INSERTQ, MOVNTSD/MOVNTSS)
    - Double prefetch (from 16 bytes -> 32 bytes)
    - Larger Branch Target Buffer
    - Larger Out of Order (OoO) buffers
    - Support for new HT standard (3.0)

    --

    I am a viral sig. Please help me spread.
    1. Re:Mainly in FP by Erich · · Score: 4, Insightful

      New instructions (LZCNT, POPCNT, EXTRQ/INSERTQ, MOVNTSD/MOVNTSS)

      Interesting!

      I can't find much information on it, but I'm guessing "LZCNT" is count-leading-zeros. This is like "find-first-one" from the other direction. It's very useful for things like finding the magnitude of an unsigned numbers. It's used quite often on architectures without FPUs (like ARM) in floating point routines for renormalization. I guess it could also be useful if you are having to do floating point emulation for numbers with enourmous precision.

      I guess if you have "BSR" then LZCNT = -BSR

      POPCNT is probably population count, the number of 1s in a value.

      Both LZCNT and POPCNT are instructions that are a pain to do in software if you lack the instruction in the hardware, and they are relatively cheap (especially if you have BSF/BSR already).

      I'm still a bit suprised that there aren't a few more of these bit-banging instructions in x86, like bit interleave/deinterleave and bit reverse. Modern processors are doing enough signal processing work that one would think you'd thow the tools in the bucket, as cheap as they are. I guess lookup tables are good enough.

      What's the over/under for which SSE revision will add a galois field multiplier? 7? 8?

      But seriously, the dual ported caches are probably the best improvement for most people. You can't be too rich, too thin, or have too much memory bandwitdth.

      It looks like AMD has done the same thing Intel did with "Core 2"... just take a good architecture and keep making improvements... more issue width, more memory bandwidth, more flexibility in scheduling. Every bit counts.

      I think we're getting to a similar point in modern CPU microarchitectures to where we are in some other industries, where drastic improvements are much more rare and it all comes down to really great implementation... like making engines. There are some innovative ideas for engines, and certainly a lot of people experiment, but really the best designs are just really well balanced and tuned. (although more cylinders is usually a good thing for horsepower).

      --

      -- Erich

      Slashdot reader since 1997

  9. Re:Linux on quad-core by MoralHazard · · Score: 4, Informative

    Urm, that depends. Linux CAN be a pre-emptible kernel, if you compile it to be. There are various levels of pre-emptibility, depending on your needs. The in-kernel docs say that pre-emption is intended for desktop environments where perceived latency is a big deal, but servers will probably benefit from the lessened overhead of a non-pre-emptible configuration.

    But the original poster's comment is still bullshit. Windows Vista is a microkernel? What has THAT guy been smoking? Multi-core designs aren't that different from multi-CPU configurations, and we already know from experience that Linux hasn't been sidelined performance-wise.

    Actually, now that I think about it, the likeliest explanation is that the OP was just trolling.

  10. Marketing speak by suv4x4 · · Score: 4, Funny

    Intel's quad-core Xeon 5300 line. AMD says that the introduction of Barcelona marks a shift in their strategy from emphasizing price to performance

    The way they spun it, you can also claim they changed their strategy from slow to expensive.