ARM In the Datacenter Isn't Dead Yet (theregister.co.uk)
prpplague writes: Despite Linus Torvald's recent claims ARM won't win in the server space, there are very specific use cases where ARM is making advances into the datacenter. One of those is for use with software-defined storage with open-source projects like CEPH. In a recent The Register article, Softiron's CTO Phil Straw states about their ARM-based CEPH appliances: "It's a totally shitty computer, but what we are trying to do here is storage, and not compute, so when you look at the IO, when you look at the buffering, when you look at the data paths, there's amazing performance -- we can approach something like a quarter of a petabyte, at 200Gbps wireline throughput." Straw claimed that, on average, SoftIron servers run 25C cooler than a comparable system powered by Xeons." So... ARM in the datacenter might be saying, "I'm not quite dead yet!"
> ARM can be easily scaled to hundreds of cores
And yet, an Android phone with 8+ cores and nominal clock speed of 2GHz+ still can't render a Javascript-heavy web site (like Amazon, Walmart, or Sears) as well as a 15 year old 700MHz Pentium III.
> without having an astronomical price
Scale an ARM-based solution up to the point where it's capable of genuinely matching the performance of an i9, and you'll find that the ARM-based solution is probably quite a bit MORE expensive.
> without requiring a nuclear power station sitting on the desk
Compared to the power and cooling requirements of a Pentium IV with 15kRPM hard drive, an i9 with RTX and SSD is practically a laptop watt-wise. 20 years ago, I literally cut a hole in the wall between my computer room and the hallway so I could put my computer in the hall & pass the cables through the wall to get the heat and noise out of my face.
AMD created the x86-64 architecture, and is making inroads with Epyc. AMD also has some RISC-V work in the pipeline. I'm predicting RISC-V will be big: Intel may try to capitalize on ARM thanks to mobile space, and AMD will start shoving RISC-V (no license fees) into processors for Chromebooks and the like, then into servers running Linux for RISC-V or something.
The next Raspberry Pi might be RISC-V. It's been mentioned. Nobody's taking that seriously yet, and they're not suggesting it seriously yet.
AMD beat Intel once doing this. They invented a whole new architecture and killed IA-64.
Support my political activism on Patreon.
ONLY true when the ARM's performance is significantly less than Intel/AMD as well.
ARM has historically had more performance per clock than x86 and x86-64; and modern ARM chips run like 2.4GHz at a watt of peak TDP on four cores.
Think about linear character matching ("abc" in "aaabc" -> "a=a, b!=a" -> "a=a, b!=a" -> "a=a, b=b, c=c" -> match) versus Booyer-Moore ("abc" in "aabc" -> "c:a = 3" -> "c=c, b=b, a=a" -> match). Booyer-Moore finds a string--faster with longer search strings--in large amounts of text with few comparisons, thus issues fewer CPU instructions.
CPUs can implement ALUs, decoders, and pipelines to execute the same instruction code in fewer clock cycles. Just like using a different software algorithm, you can use a different hardware approach.
Prefixed instructions and fixed-length instruction sets are core to ARM. Literally every instruction is prefixed. That means where you might compare for one cycle, then jump or not jump on the next cycle, ARM simply jumps or doesn't jump. One fewer cycle.
The decoder doesn't have to deal with figuring out instruction size or the content if it picks an instruction prefixed to only execute if ZF is set, so if you SUB r2, r1 and the result is zero, the next instruction that executes only if ZF is not set is just skipped and the decoder moves on.
Because the CPU will read ahead and cache (preload) the next several instructions (fetches from RAM are slow!), it's technically-possible to block out the next e.g. 10 instructions as IFZ [INSN], and have an ARM CPU internally identify the next several instructions are prefixed IFZ and just skip the instruction pointer ahead that many. Remember: every instruction is exactly one word wide; you don't need to know what the next instruction is to know where the following instruction starts. You don't have to decode the instructions if they won't be executed.
This feature frequently eliminates a large number of comparisons and jumps, trimming down the size of the code body (you'd think variable-length insns would do that, but that usually doesn't work out). More instructions fit into cache, and branch prediction becomes simpler (less power) and more-effective.
ARM also has 30 GPRs. x86-64 has 10 GPRs, plus source/destination/base/count pointer registers that are basically GPRs. A lot happens without using RAM as an intermediate scratch pad.
It's like LED lighting. A single LED might throw off light with just milliwatts of power... but crank it up so it's throwing off EXACTLY the same amount of light as a 100-watt halogen lightbulb (measured from every direction), with color fidelity that's at least as good as that 100-watt halogen bulb (none of this "80+ CRI" shit, or even "92+ CRI with weak R9"), and it's going to CONSUME at least 70-80 watts and throw off almost as much heat AS the original incandescent bulb
Halogen and incandescent bulbs are black-body emitters: much of their light is in the infrared range. LEDs are narrow emitters and use combinations of materials to emit in multiple ranges when providing white light. That means an LED operating on 100 watts of power emits about 80 watts of visible light, while a halogen operating at 100 watts emits about 20 watts of visible light, and an incandescent tungsten-coil bulb emits about 10 watts of visible light.
An LED emitting the same broad-spectrum visible light as a 100-watt halogen would consume 25 watts of power.
Support my political activism on Patreon.