Slashdot Mirror


Intel Cascade Lake-AP Xeon CPUs Embrace the Multi-Chip Module (techreport.com)

Ahead of the annual Supercomputing 2018 conference next week, Intel today announced part of its upcoming Cascade Lake strategy. From a report: The company teased plans for a new Xeon platform called Cascade Lake Advanced Performance, or Cascade Lake-AP, this morning ahead of the Supercomputing 2018 conference. This next-gen platform doubles the cores per socket from an Intel system by joining a number of Cascade Lake Xeon dies together on a single package with the blue team's Ultra Path Interconnect, or UPI. Intel will allow Cascade Lake-AP servers to employ up to two-socket (2S) topologies, for as many as 96 cores per server.

Intel chose to share two competitive performance numbers alongside the disclosure of Cascade Lake-AP. One of these is that a top-end Cascade Lake-AP system can put up 3.4x the Linpack throughput of a dual-socket AMD Epyc 7601 platform. This benchmark hits AMD where it hurts. The AVX-512 instruction set gives Intel CPUs a major leg up on the competition in high-performance computing applications where floating-point throughput is paramount. Intel used its own compilers to create binaries for this comparison, and that decision could create favorable Linpack performance results versus AMD CPUs, as well.

72 comments

  1. embrace by Anonymous Coward · · Score: 0

    this first post lmao

    1. Re:embrace by Anonymous Coward · · Score: 0

      embrace deez nutz

  2. Linpack throughput by Anonymous Coward · · Score: 1

    Synthetic benchmark completely rigged to give Intel's kit an advantage does indeed give it an advantage, news at 11.

    1. Re:Linpack throughput by LoganTeamX · · Score: 1

      This. Also, "Intel forges ahead using their processor-specific compilers and instructions, ensuring their stranglehold over FPU instruction set adoption remains one generation ahead of the competition's ability to license access to said instructions."

      --
      One of the 187.
    2. Re:Linpack throughput by Tough+Love · · Score: 1

      Right, really just benchmarking AVX512 for a load that doesn't even make sense on the CPU, should be GPGPU. AMD will provide details on Rome (64 core Epyc) tomorrow, apparently. This is Zen 2 said to offer 13% IPC boost over Zen. When Rome comes out, Intel will most probably be a node behind for the first time ever.

      --
      When all you have is a hammer, every problem starts to look like a thumb.
    3. Re:Linpack throughput by Agripa · · Score: 1

      Synthetic benchmark completely rigged to give Intel's kit an advantage does indeed give it an advantage, news at 11.

      Hey, if AMD wanted better results then they should have used their own compiler on Intel's benchmark. It is not Intel's fault that Intel's compiler selects what processor features to use based on CPUID rather than capability.

  3. Times change by Tsolias · · Score: 5, Interesting

    A 1.5 years ago: Glued together CPUs BAD
    Now: Glued together CPUs GOOD

    1. Re:Times change by Octorian · · Score: 1

      25 years ago... multi-chip module good.
      https://en.wikipedia.org/wiki/POWER2

      Seems like IBM had trouble actually fitting an entire POWER CPU on a single die back then. So they started with the CPU being a whole card of individually packaged chips, then later moved onto a multi-chip module. Makes me wonder if anyone else did the MCM thing, since I haven't seen non-IBM old computers taking that approach.

    2. Re:Times change by vux984 · · Score: 1

      Your own wikipedia link mentions AMD Socket G34 and SP3 processors; the WiiU, Pentium Pro, and Pentium D, amongst others...

    3. Re:Times change by Anonymous Coward · · Score: 0

      Sure, wait for the new single-die processors on Intel's 10 nm process - exepted in a year or so.

      We really wonder about the clock frequencies, cache sizes and power consumption (the price being hard to beat).

    4. Re:Times change by Anonymous Coward · · Score: 0

      Yeah - I bought a Pentium D when it came out... What a steaming pile of garbage that was.

    5. Re:Times change by rogoshen1 · · Score: 1

      i think 'smoldering' or 'smoking' is the word you're looking for? Unless it was water cooled of course, if so, carry on!

    6. Re:Times change by Tsolias · · Score: 1

      some of those don't have nUMA... so it's literally 2 chips glued together.

    7. Re:Times change by Tsolias · · Score: 1

      I was referring to this
      https://www.pcgamer.com/intel-...

    8. Re:Times change by Anonymous Coward · · Score: 0

      Some like Pentium D, Core 2 Quad and Via Nano X4 are so dumb indeed they are just wired to the FSB. So there is a long path from one die to the other, I think to the motherboard chipset back and forth. This worked fine as Intel developed a very fast FSB for the Pentium 4.
      I think the first 8-core Mac Pro (two quad core Xeon) is thus effectively four dual core CPUs sharing a bus. Intel also sold this in a limited way to the "exxxtreme gamerzz". Quad core/eight theads Nehalem (a single CPU) was faster.

      An underappreciated MCM : Intel laptops with CPU + chipset MCM. That may be a factor in they still dominating laptops. Good work. Can still have So-DIMM and M.2 slots with it.

  4. Times change-Coprocessor. by Anonymous Coward · · Score: 0

    The AVX-512 instruction set gives Intel CPUs a major leg up on the competition in high-performance computing applications where floating-point throughput is paramount.

    The nature of the AMD link is that is could add a co-processor to it's design easily. That's one of the reasons both Intel and AMD are going this way is because it makes it easier to adjust to the market.

    1. Re:Times change-Coprocessor. by Agripa · · Score: 1

      The nature of the AMD link is that is could add a co-processor to it's design easily. That's one of the reasons both Intel and AMD are going this way is because it makes it easier to adjust to the market.

      The reason is that there are three ways to follow Moore's law of decreasing the cost per transistor; increased transistor density, increased area, and denser packaging. The first two have largely reached their limit so it is time to turn the packaging crank.

    2. Re:Times change-Coprocessor. by Anonymous Coward · · Score: 0

      I think Intel already has a socketed MCM on the market, not for coprocessor but instead it's Xeon + 100Gbps interconnect (for a supercomputer/cluster or a custom system with bigger socket count)

      AMD uses their stuff internally (inside the die) too, can be seen on Ryzen 2400G etc. APU, will extend it to external big GPUs. That's where you may more easily find coprocessors, extend it to go to PCIe 4.0x 16x slots - IBM already has this (OpenCAPI)
      Nvidia can also in principle make MCMs with the NVLink interface. Works with multiple GPU boards anyway. Found on really expensive systems.. or for consumers, as RTX 2080 SLI (two GPU max)

      The nature of the AMD link is that is could add a co-processor to it's design easily.

      We might see it, I guess you need to contract with AMD's semi-custom division so as to develop an MCM with your coprocessor. Easy but an expensive affair.

  5. Problem solving, Intel style by Pollux · · Score: 1

    Intel, circa 2017: "We cannot figure out how to successfully engineer 10nm wafers. Our tick-tock strategy is stalled, and we cannot design chips that are any faster. What should we do?"
    Intel Solution: "MORE CORES!"

    Intel, circa 2018: "AMD just released Ryzen, and it's destroying us in benchmarks. Anyone figure out that 10nm thingie yet?"
    Intel Solution: "Nope. But we did add MORE CORES!"

    Now must be a great time to be an Intel engineer.

  6. Buzzword soup by Anonymous Coward · · Score: 0

    But did they get around to fixing those horrible information leaks they designed right into their CPUs? If not, when will they get to it? Ever?

    Meanwhile, it's easy to forget that IBM still produces faster CPUs with more cores and more (actual-SMT-)threads per core and more cache and more performance. Mere mortals can't actually buy them, but they do exist. intel has always been a poor man's game in comparison.

    1. Re:Buzzword soup by Waffle+Iron · · Score: 2

      But did they get around to fixing those horrible information leaks they designed right into their CPUs? If not, when will they get to it? Ever?

      This does address those issues: Their theory is that with so many cores thrown at the workload, the chance of malware even finding the the core that is working on sensitive information is negligible.

    2. Re:Buzzword soup by petermgreen · · Score: 1

      https://www.raptorcs.com/conte...

      Expensive certainly but not totally impossible for a "mere mortal" to buy.

      --
      note: i'm known as plugwash most places but i screwd up registering that here somehow in the past and now can't register
    3. Re:Buzzword soup by Anonymous Coward · · Score: 0

      Actually the 18 core (72 threads) CPU is not that expensive, especially for a chip with 90MB of L3 cache. The 22/88/110 is another matter.
      The CPU also has four DDR4 (with ECC) channels and 48 PCIe Gen4 lines.
      The real issue is the cost of the motherboard, and maybe the cooling: the 18 core version has a TDP of 160W.
      Bottom line: you can have a really fast Power9 machine for ~$10000, with 36 cores/144 threads, with 128 or 256GB of ECC DRAM, state of the art I/O and memory bandwidth.

    4. Re:Buzzword soup by petermgreen · · Score: 1

      Yeah, I was looking at the price for a complete system.

      Looking at the bits seperately, the CPU prices do indeed seem reasonable, the mainboard prices on the other hand make small core count systems prohibitively expensive.

      --
      note: i'm known as plugwash most places but i screwd up registering that here somehow in the past and now can't register
  7. Licensing implications? by Anonymous Coward · · Score: 0

    What are the implications for per-core or per-CPU licensing schemes? Its already bad enough as it is.

    1. Re:Licensing implications? by sinij · · Score: 1

      The implication is that Oracle, VM and such will continue having their way with you, and you will continue squealing like a little piggy while taking it.

    2. Re:Licensing implications? by OtisSnerd · · Score: 1

      The implication is that Oracle, VM and such will continue having their way with you, and you will continue squealing like a little piggy while taking it.

      Which is why getting a dose of Intel's 'CLAP' is such a bad idea.

    3. Re: Licensing implications? by Anonymous Coward · · Score: 0

      Yep. Oracle SE licensed per socket, except for multi chip modules. If you run SE, avoid these processors.

      Or buy lube.

  8. Herp_AMD_derp by Anonymous Coward · · Score: 0

    N/t

  9. Intel says Intel CPUs are great by sinij · · Score: 3, Informative

    Intel says Intel CPUs are great. Yeah, what else are they going to say?

    It is all marketing hype until independent third-party bench-marking is done.

    1. Re:Intel says Intel CPUs are great by Anonymous Coward · · Score: 0

      independent third-party

      Like Tom's Hardware?

    2. Re:Intel says Intel CPUs are great by Anonymous Coward · · Score: 0

      Just buy it.

  10. Re:Vote Democrat! Abolish ICE! Blue Wave! by Anonymous Coward · · Score: 0

    Buy Intel!.

  11. New measurement standard by Anonymous Coward · · Score: 0

    Fuck linpack. We need a new measurement standard. Vulnerabilities per release.

  12. Hang on a minute by jon3k · · Score: 1

    One of these is that a top-end Cascade Lake-AP system can put up 3.4x the Linpack throughput of a dual-socket AMD Epyc 7601 platform. This benchmark hits AMD where it hurts.

    Now let's see what it costs.

    1. Re:Hang on a minute by Tough+Love · · Score: 1

      Let's see if they can actually make them. Intel's purported Threadripper answer i9-9900k is still out of stock. You can try a scalper for $1k. Then there are persistent tales of overheating, you can't cool it even with a normal water cooler.

      --
      When all you have is a hammer, every problem starts to look like a thumb.
  13. how many pci-e lanes in 1 Socket and 2 socket? by Joe_Dragon · · Score: 3, Insightful

    how many pci-e lanes in 1 Socket and 2 socket?

    With AMD you have 128 with one or 2

    1. Re:how many pci-e lanes in 1 Socket and 2 socket? by Agripa · · Score: 1

      how many pci-e lanes in 1 Socket and 2 socket?

      With AMD you have 128 with one or 2

      None, but they made it thinner, removed the analog headphone jack, and added a notch.

    2. Re:how many pci-e lanes in 1 Socket and 2 socket? by Anonymous Coward · · Score: 0

      Nobody cares that you own an iphone...

      --Highdude702(mods)

  14. mac pro will have this late 2018 starting at 7-10K by Joe_Dragon · · Score: 1

    mac pro will have this late 2018 starting at 7-10K (dual cpu base with fully loaded ram channels and crap base video card)

  15. Why stop at 2? by Anonymous Coward · · Score: 0

    Inmos linked over a thousand processors together. You could even have a Beowulf cluster of them!

    Seriously, bung a large amount of RAM onto some chips, then add maybe 8 of these CPU chips, and put the whole lot in one case. Not quite System on a Chip, but it's close enough.

  16. Intel Compiler by Luthair · · Score: 2

    Specifically blocks non-Intel CPUs from getting an optimized code path, hardly shocking their CPU performs a lot better.

    1. Re:Intel Compiler by DamnOregonian · · Score: 1

      Actually, chooses a specially optimized path for GenuineIntel parts, and falls back to a standard path for non.
      Now I don't disagree with you that this sucks. However, I also think you're a disingenuous shit for wording it the way you did. Hurrah for the death of intellectual honesty.

    2. Re:Intel Compiler by Anonymous Coward · · Score: 0

      The path is so "optimized" that Intel claims one must disable hyperthreading and SMT (halving AMD's threads) to achieve maximum performance. The intellectual dishonesty starts with Intel. It's really laughable.

    3. Re:Intel Compiler by Agripa · · Score: 1

      Actually, chooses a specially optimized path for GenuineIntel parts, and falls back to a standard path for non.

      Now I don't disagree with you that this sucks. However, I also think you're a disingenuous shit for wording it the way you did. Hurrah for the death of intellectual honesty.

      Luthair was more accurate than you. Intel's compiler disables the use of features advertised by the processor in the feature flags like instruction set extensions if the CPUID is not GenuineIntel. The "standard path" is to ignore the processor features.

      After Intel lost the lawsuit over this, the court required them to advertise this fact which they did by posting a non-searchable graphics image with the text as a fuck you to the judge.

    4. Re:Intel Compiler by DamnOregonian · · Score: 1

      The "standard path" is to ignore the processor features.

      No, that is not how compilers work. Try again.

    5. Re:Intel Compiler by Anonymous Coward · · Score: 0

      The compiler works however the person(s) who wrote it tell it to work. Do you not know how computers work? People give them instructions...

      --Highdude702(mods)

      P.S. you can find multiple articles from this lawsuit he is talking about by searching "amd vs intel lawsuit"

    6. Re:Intel Compiler by Luthair · · Score: 1

      You need to do more reading friend. https://www.agner.org/optimize...

    7. Re:Intel Compiler by DamnOregonian · · Score: 1

      I'm very familiar with the lawsuit.
      The compiler works exactly as I said it did.
      The standard path excludes SSE support. If the runtime code detects a GenuineIntel part, it checks the CPUID flags to see if it has SSE support, and then utilizes the accelerated functions.
      As I said, you can argue all day whether that's crappy behavior (it's not like AMD chips don't have equivalent CPUID registers) but it is factually incorrect to say that they neuter code on AMD processors. They refuse to accelerate it on non-GenuineIntel parts.

    8. Re:Intel Compiler by DamnOregonian · · Score: 1

      The compiler works however the person(s) who wrote it tell it to work. Do you not know how computers work? People give them instructions...

      Let's look at the factual evidence. Intel compiler related code runs slower in some cases on AMD processors than Intel processors.
      There could be 2 reasons for this:
      1) The standard code path uses SSE instructions to accelerate performance, but those are overridden by safe functions if an AMD processor is detected.
      2) The standard code path is unaccelerated and switches in SSE instructions if it detects a GenuineIntel part with SSE support.

      One of those is legally defendable, one is not.
      Being AMD *failed* to win an injunction to change the behavior of the compiler, which of those do you think is actual methodology?

      P.S. you can find multiple articles from this lawsuit he is talking about by searching "amd vs intel lawsuit"

      You can indeed. And you apparently failed to understand them.
      You're right, compilers work how the person(s) who wrote it tell it to work. But you assume a completely illogical code path must be the case, when there is an obvious logical one that produces the same result. I can only conclude that you're an idiot.

  17. Re:mac pro will have this late 2018 starting at 7- by Anonymous Coward · · Score: 0

    2018 Mac Pro?

    this is a joke right.

  18. intel is losing big time to amd by Joe_Dragon · · Score: 1

    intel is losing big time to amd

  19. 340% faster? by augustz · · Score: 1

    Does no one just ask - is this even a reasonable claim?

    Intel is going to be on an OLDER process node - their architecture is not running 340% faster than AMD.

    Intel is comparing their theoretical future chip with Epyc chips shipping now. https://www.newegg.com/Product...

    Once these chips are available in quantity (they are not) drop them into some servers and start bench-marking them on performance per price /watt. And compare them to AMD chips coming out at that time.

    If this is the benchmark that is hitting AMD where it "hurts" AMD is in good position. When your competitor benchmarks their future products against your current products instead of their current ones, you KNOW you are good.

    And the latest set of benchmark shenanigans don't look good for intel either.

    1. Re:340% faster? by Anonymous Coward · · Score: 0

      is this even a reasonable claim?

      With Intel using Intel's compiler and Intel's Marketing department releasing the numbers? Probably not.

    2. Re:340% faster? by Anonymous Coward · · Score: 0

      Psst, Linpack uses Intel's own libraries. They practically wrote the benchmark. It uses AVX512 which AMD doesn't even support (AMD really only supports 128-bit SIMD; they support AVX2 enough to run it, but Zen/Zen+ CPUs have 1/2 AVX2 throughput).

      Try running Cascade Lake-AP in some non-AVX2/AVX512 benches and you'll get more realistic results.

      It's nothing but MCMed Sky Lake-SP with some hardware mitigations for Spectre/Meltdown. That's it. Oh and apparently they cut the socket count in half. No 4P Cascade Lake-AP for you!

    3. Re:340% faster? by Anonymous Coward · · Score: 0

      Does no one just ask - is this even a reasonable claim?

      Sure, for some value of reasonable. You may safely assume this 340% is for a very narrow benchmark of operations perfectly suited to AVX-512. For that fraction of the market that cares deeply about those operations Intel is the answer. And that fraction of the market may actually be large in terms of unit shipments. Most buyers, however, aren't likely to care much.

      I remember seeing this same behavior in the CISC vs RISC era. The various vertical systems manufacturers (Digital, Sun, HP, etc.) usually produced FLOPS figures far beyond Intel. The integer performance delta was either much smaller or non-existent and the price difference was vast.

      Nearly all of those designs and their amazing FLOPS benchmarks are legacy curiosities now. Only some vestiges remain. Supposedly Oracle still ships some SPARC but they've spaced the staff so that's on a clock. IBM is making POWER stuff for their deepest pockets, and that will probably continue until the US government and western banks have a big hiccup and have to stop pissing away money. Intel is shipping their last Itaniums; they announced the end of Itanium last year and hardly anyone noticed. The truth is that if all of those chips vanished tomorrow the announcements wouldn't survive a full news cycle.

      Intel is on the back foot. Their marketing people are trying to make the best of a bad situation. It takes Intel around 5 years to straighten itself out when competitors start nipping at them. It happened during the 386/486 era when a bunch of clone CPUs appeared. It happened again when AMD revolutionized x86 with 64 bits and showed the world just how foolish Intel had been neglecting x86. Intel recovered each time. I'm looking forward to what they come up with this time.

    4. Re:340% faster? by Anonymous Coward · · Score: 0

      I support AMD (my last two processors have been an AMD 8350 and a Ryzen 2700X) but AVX512 is fucking awesome. 16 32bit floating point operations in one instruction gives me a semi. Let's give credit where it's due.

    5. Re:340% faster? by Anonymous Coward · · Score: 0

      Indeed, let's credit Intel for the shitty fixed-width SIMD paradigm that requires a code rewrite with every revision, and bloats the instruction set and hardware and compiler with legacy garbage that needs to be maintained for compatibility. See SIMD considered harmful. With Cray-style vectors or the more modern RISC-V implementation, the width can be scaled without the sprawling mess.

    6. Re:340% faster? by Tough+Love · · Score: 1

      AVX512 is fucking awesome. 16 32bit floating point operations in one instruction gives me a semi.

      Isn't that what the GPU is for? Which has many, many times the flops. Not sure what configuration Intel is targeting here, seems very boutique.

      --
      When all you have is a hammer, every problem starts to look like a thumb.
    7. Re:340% faster? by Agripa · · Score: 1

      AVX512, and even AVX2, are not everything one would hope for. Intel CPUs have to downclock the core when executing AVX512 and some AVX2 instructions which affects everything executing on that core so there are many cases where AVX and a limited selection of AVX2 instructions are faster.

    8. Re:340% faster? by Anonymous Coward · · Score: 0

      There are supercomputers out there with GPUs or the Xeon Phi that just sit idle when they're given CPU-only workloads.

      More plainly, server farms / datacenters are choke full of vanilla x86 servers. They got SSE before, got AVX and now AVX512. So soon, there is AVX512 on your server even on low end and even if it's running boring crap like shopping carts, slashdot, payroll, file server or $5/month VPS.

    9. Re:340% faster? by Tough+Love · · Score: 1

      Still seems like a dubious use of silicon. Until the killer app arrives where joe average user is seeing a significant boost, as opposed to a crafted benchmark, more cores and more cache seem like the win. Meanwhile, GPUs dominate the top 500 now, I would guess that CPUs are sitting idle more than GPUs are.

      --
      When all you have is a hammer, every problem starts to look like a thumb.
  20. 90% gamers are poor. by Anonymous Coward · · Score: 0

    If the customer wants only 1-core CPU instead of 32 cores then does it divide 32 the price and the power consumption?

    1. Re:90% gamers are poor. by wed128 · · Score: 1

      Xeons aren't for gamers. Most computer equipment isn't.

  21. More Cores? by Anonymous Coward · · Score: 0

    Why bother? I'm reliably informed from Slashdotters that you most people only read email and browse the web. Therefore these are unnecessary.

  22. "Independent" ... And who's checking THAT? by Anonymous Coward · · Score: 0

    It’s all useless, until YOU, PERSONALLY have verified either their trustworthiness, or the thing itself.
    Because YOU are who's watching the watchmen. (Or me in my case.)

    And even then, you are not an independent entity, just like any third-party too. You are still subject to manipulation. Like using an Intel compiler (which is deliberately designed to make code slower on AMD CPUs), or leaving XMP disabled on AMD, or not watching out for built-in "benchmark mode" where the device cheats, etc.

    Oh, and if you’re not an expert, or worse, *believe* you’re an expert when you’re not, you cannot even judge the competency of said third party.

  23. Assuming 7-10K being the PRICE? by Anonymous Coward · · Score: 0

    Which is, of course, utterly nuts, ... especially for hardware that will, as always with Apple, be a total piece of shit on the inside (see Louis Rossman's videos), ... and hence probably actually true exactly for that reason.

    To me it still sounds like a surreal joke.
    I couldn't even get to such a low level of conscience if I ripped my brain out. I honestly thing we should define two separate species right now: Homo Sapiens, and Homo Psychopathis. Maybe Homo Iumentis too...

  24. "teased plans" means its all just an idea on paper by Anonymous Coward · · Score: 0

    Whats the timeline for this? AMD already engineering samples of 64 core EPYC 7nm chips out there expected to come early next year. How long until the intel stuff is actually available, not just "teased plans"?

  25. EPYC's SMT was disabled on that comparison test. by Anonymous Coward · · Score: 0

    Straight from the article, "Intel didn't note whether Hyper-Threading would be available from Cascade Lake-AP chips, and indeed, its comparative numbers against that dual-socket Epyc 7601 system were obtained with SMT off on the AMD platform."

  26. Advanced Placement by Tough+Love · · Score: 1

    AMD schooled Intel.

    --
    When all you have is a hammer, every problem starts to look like a thumb.
  27. Re:"teased plans" means its all just an idea on pa by Anonymous Coward · · Score: 0

    This is another Skylake variant, plus an option of packing a quad system socket into a dual socket basically.
    If you don't use the new SIMD instructions for INT8 and INT16 (neural network inference) a current quad socket wouldn't be too different, with a bit worse performance per watt (same silicon process as Kaby Lake on the older, same as Coffee Lake on the newer). Like, a current computer with four Xeon Gold 6000 series - I have no idea how big that is or how much that costs.