ARM In the Datacenter Isn't Dead Yet (theregister.co.uk)
prpplague writes: Despite Linus Torvald's recent claims ARM won't win in the server space, there are very specific use cases where ARM is making advances into the datacenter. One of those is for use with software-defined storage with open-source projects like CEPH. In a recent The Register article, Softiron's CTO Phil Straw states about their ARM-based CEPH appliances: "It's a totally shitty computer, but what we are trying to do here is storage, and not compute, so when you look at the IO, when you look at the buffering, when you look at the data paths, there's amazing performance -- we can approach something like a quarter of a petabyte, at 200Gbps wireline throughput." Straw claimed that, on average, SoftIron servers run 25C cooler than a comparable system powered by Xeons." So... ARM in the datacenter might be saying, "I'm not quite dead yet!"
Lack of Intel Management Engine or other spyware built-in features, that can't be removed without a high degree of risks of permanently damaging the hardware.
remember that time when everybody said intel x86 would never make it in the data center...
On a long enough timeline, the survival rate for everyone drops to zero.
There's a disembodied zombie ARM in the datacenter! Oh God, it's not dead yet and, ... and it's coming for us!!!!!! AAAAAAAAAAAAAAAAAAAAAAAAAAAAH THE DOOR IS LOCKED!!!
> ARM can be easily scaled to hundreds of cores
And yet, an Android phone with 8+ cores and nominal clock speed of 2GHz+ still can't render a Javascript-heavy web site (like Amazon, Walmart, or Sears) as well as a 15 year old 700MHz Pentium III.
> without having an astronomical price
Scale an ARM-based solution up to the point where it's capable of genuinely matching the performance of an i9, and you'll find that the ARM-based solution is probably quite a bit MORE expensive.
> without requiring a nuclear power station sitting on the desk
Compared to the power and cooling requirements of a Pentium IV with 15kRPM hard drive, an i9 with RTX and SSD is practically a laptop watt-wise. 20 years ago, I literally cut a hole in the wall between my computer room and the hallway so I could put my computer in the hall & pass the cables through the wall to get the heat and noise out of my face.
it reverses the traditional storage server layout by moving CPUs to the front of the PCB and storage drives to the back. This means cool air from the fans blows over the drives first, and then the CPUs â" which wouldn't make any sense in a compute server.
Um... What?
Modern servers cool front to back. They place the drives in front where they are cooled first. Some place the CPUs behind the drives. Others place the CPUs in parallel with the drives so that they're also cooled directly from ambient air.
No one in their right mind places the drives after the CPUs. Losing a CPU is just money. Losing the drive is everything.
Were you maybe trying to say they put the fans in front of the drives instead of behind them, pushing air instead of pulling it? That doesn't make any real difference to the cooling, but it makes hot-swap harder.
If these machines actually cool back to front, that's a bad thing. Modern data centers are laid out with a hot-aisle cold-aisle design. The wiring and exhaust side (the back) faces the hot aisle. Equipment which reverses that flow effs everything up. Cisco is especially bad about this, but servers mostly get it right.
Moderating "-1, Disagree" is simple censorship. Have the guts to post your opinion.
Well, technically you'd have to say he observes, then opines (literally "to state as one's own opinion"). If all he ever did was observe then we'd never know, would we?
Post may contain irony: discontinue use if experiencing mood swings, nausea or elevated blood pressure.
AMD created the x86-64 architecture, and is making inroads with Epyc. AMD also has some RISC-V work in the pipeline. I'm predicting RISC-V will be big: Intel may try to capitalize on ARM thanks to mobile space, and AMD will start shoving RISC-V (no license fees) into processors for Chromebooks and the like, then into servers running Linux for RISC-V or something.
The next Raspberry Pi might be RISC-V. It's been mentioned. Nobody's taking that seriously yet, and they're not suggesting it seriously yet.
AMD beat Intel once doing this. They invented a whole new architecture and killed IA-64.
Support my political activism on Patreon.
ONLY true when the ARM's performance is significantly less than Intel/AMD as well.
ARM has historically had more performance per clock than x86 and x86-64; and modern ARM chips run like 2.4GHz at a watt of peak TDP on four cores.
Think about linear character matching ("abc" in "aaabc" -> "a=a, b!=a" -> "a=a, b!=a" -> "a=a, b=b, c=c" -> match) versus Booyer-Moore ("abc" in "aabc" -> "c:a = 3" -> "c=c, b=b, a=a" -> match). Booyer-Moore finds a string--faster with longer search strings--in large amounts of text with few comparisons, thus issues fewer CPU instructions.
CPUs can implement ALUs, decoders, and pipelines to execute the same instruction code in fewer clock cycles. Just like using a different software algorithm, you can use a different hardware approach.
Prefixed instructions and fixed-length instruction sets are core to ARM. Literally every instruction is prefixed. That means where you might compare for one cycle, then jump or not jump on the next cycle, ARM simply jumps or doesn't jump. One fewer cycle.
The decoder doesn't have to deal with figuring out instruction size or the content if it picks an instruction prefixed to only execute if ZF is set, so if you SUB r2, r1 and the result is zero, the next instruction that executes only if ZF is not set is just skipped and the decoder moves on.
Because the CPU will read ahead and cache (preload) the next several instructions (fetches from RAM are slow!), it's technically-possible to block out the next e.g. 10 instructions as IFZ [INSN], and have an ARM CPU internally identify the next several instructions are prefixed IFZ and just skip the instruction pointer ahead that many. Remember: every instruction is exactly one word wide; you don't need to know what the next instruction is to know where the following instruction starts. You don't have to decode the instructions if they won't be executed.
This feature frequently eliminates a large number of comparisons and jumps, trimming down the size of the code body (you'd think variable-length insns would do that, but that usually doesn't work out). More instructions fit into cache, and branch prediction becomes simpler (less power) and more-effective.
ARM also has 30 GPRs. x86-64 has 10 GPRs, plus source/destination/base/count pointer registers that are basically GPRs. A lot happens without using RAM as an intermediate scratch pad.
It's like LED lighting. A single LED might throw off light with just milliwatts of power... but crank it up so it's throwing off EXACTLY the same amount of light as a 100-watt halogen lightbulb (measured from every direction), with color fidelity that's at least as good as that 100-watt halogen bulb (none of this "80+ CRI" shit, or even "92+ CRI with weak R9"), and it's going to CONSUME at least 70-80 watts and throw off almost as much heat AS the original incandescent bulb
Halogen and incandescent bulbs are black-body emitters: much of their light is in the infrared range. LEDs are narrow emitters and use combinations of materials to emit in multiple ranges when providing white light. That means an LED operating on 100 watts of power emits about 80 watts of visible light, while a halogen operating at 100 watts emits about 20 watts of visible light, and an incandescent tungsten-coil bulb emits about 10 watts of visible light.
An LED emitting the same broad-spectrum visible light as a 100-watt halogen would consume 25 watts of power.
Support my political activism on Patreon.