ARM In the Datacenter Isn't Dead Yet (theregister.co.uk)
prpplague writes: Despite Linus Torvald's recent claims ARM won't win in the server space, there are very specific use cases where ARM is making advances into the datacenter. One of those is for use with software-defined storage with open-source projects like CEPH. In a recent The Register article, Softiron's CTO Phil Straw states about their ARM-based CEPH appliances: "It's a totally shitty computer, but what we are trying to do here is storage, and not compute, so when you look at the IO, when you look at the buffering, when you look at the data paths, there's amazing performance -- we can approach something like a quarter of a petabyte, at 200Gbps wireline throughput." Straw claimed that, on average, SoftIron servers run 25C cooler than a comparable system powered by Xeons." So... ARM in the datacenter might be saying, "I'm not quite dead yet!"
The FUD is strong in this submission ...
"We mustn't be caught by surprise by our own advancing technology" -- Aldous Huxley
Maybe its not the hardware that's holding ARM back on servers? maybe its the software? Sort of like the chicken and the egg story with ARM on desktop PC's as well. Torvolds is probably right on this one, he is the closest to what is happening in ARM support.
The claim of Linus is "that as long as everybody does cross-development, the platform won't be all that stable". I have my web hosting on arm and I compile on arm. I cannot find any good and cheap arm laptop with ubuntu. If this does not happen soon, arm in servers will die shortly, like a hype. IMHO the future is not decided yet and what give Linus is a good indicator to analyse where it is aiming. If this summer we have many arm laptops that sell reasonably well on the market, I continue hosting on arm. Otherwise I go back to intel.
ARM adoption will increase because AWS offers the a1 instance family now. You can now easily fire up servers with ARM hardware to work on your software solutions. For many applications it will be a viable solution with substantial cost savings. Watch the stories and statistics that you start seeing at the summits and reinvent from customers in 2019.
And that's got intel scared.
Anyone can license the cores, Apple is doing it now for laptops, workstations won't be far behind once the laptops prove they have some serious oomph.
And you can dump the legacy x86 craphole shithacks that place that old cpu way of thinking, and end up with something that is actually quite good and easy to work with.
ARM will win in the datacenter for the simple reasons such as:
1) Cost of CPU is a fraction on intel/amd offerings.
2) Cost of running the CPU in heat and power is *significantly* less than intel/amd
3) *Significantly* cleaner implementation, none of thelegacy x86 crap to deal with
4) Need specific instructions? Simply add it to the cpu design, or choose one of the *thousands* of designs already out there
5) Apple are using it for laptops. That means it's got the oomph needed.
6) Not subject to intel "Price teiring". which is assholery at it's finest and deserves to die in a fire.
7) The market needs this to shake things up and get innovation going again. The new kid is here, and it's kerb stomping time for x86
But it's on it's way. And FUD like this just proves how scary that is for the usual suspects.
Amiga Rules!
Lack of Intel Management Engine or other spyware built-in features, that can't be removed without a high degree of risks of permanently damaging the hardware.
ARM can be easily scaled to hundreds of cores maybe more without having an astronomical price and and without requiring a nuclear power station sitting on the desk right behind [gaming] PC :)
Cue BSD apologists who will immediately get butthurt and point out that BSD is not, in fact, dead because one company per million (like, Netflix) is still using it, and that it pushes, like 30% of USA nightly traffic, conveniently omitting the fact that in global numbers that's less than 1% of traffic and less than 0.00001% in corporate market share.
There used to be two, but Delphix is migrating to Linux, so there's now only one company that does BSD.
choose one of the *thousands* of designs already out there
Anyone who thinks that's a good thing, especially regarding instruction set architecture extensions, has completely lost the plot.
What they are used for varies by implementation, since ARM is all kinds of things to various people; but 'Trustzone' extensions are specifically designed to provide analogous capabilities(at lower cost, the invisible super-privilege enclave is logically separated but runs on the same CPU rather than being a separate processor); and tends to be used for similar purposes in cases where conditional access enforcement or 'platform integrity' are design goals. ARM SoCs commonly also implement all the features one requires for a full crypto bootloader lockdown.
If you are working at some scale this matters less because you get to dictate if some of those features are enabled, whose keys are burned in as trusted, etc.(unlike Intel, where your leverage is likely to be substantially lower: there is at least one exception, the High Assurance Platform ME firmware variant, but for the most part they aren't terribly open to suggestions in that area); but if you are buying consumer or small business quantities of off-the-shelf ARM there's no particular reason to be more optimistic about how much control you have over the low level behavior.
remember that time when everybody said intel x86 would never make it in the data center...
On a long enough timeline, the survival rate for everyone drops to zero.
I remember when ARM was the cool kid. Now I guess it's just some old geezer yellin', "Get offa my greenboard!"
(-1: Post disagrees with my already-settled worldview) is not a valid mod option.
as I said in on of my weekly news Bits livestreams, IMHO it was always about costs, and x86 was just so much cheaper than stuff from Sun, Sgi, IBM, etc. you name it: https://www.youtube.com/watch?...
amd epyc is good for pci-e storage nodes with the 128 pci-e lanes.
also CEPH / ZFS like lots of ram as well.
There's a disembodied zombie ARM in the datacenter! Oh God, it's not dead yet and, ... and it's coming for us!!!!!! AAAAAAAAAAAAAAAAAAAAAAAAAAAAH THE DOOR IS LOCKED!!!
Another advantage is that some ARM processors aren't affected by the speculative execution vulnerabilities. In particular the ARM Cortex-A53, which is used in this server
http://www.socionextus.com/products/synquacer-edge-server/
is immune to speculative execution vulnerabilities.
Rot in prison and write a book about your ass-love with Putin.
What's the matter, complier switches got you down?
What's the matter, complier switches got you down?
You've obviously never worked in real datacenters on real enterprise-scale system.
Only toy systems require recompiling from source code to deploy functionality.
Talk is cheap
dedicated dual or quad core ARM instances for just a few euros. ARM in the cloud sounds like specialty stuff at high prices, but it's readily available at low prices.
I use *BSD but I have never posted on "Usenet".
https://en.wikiquote.org/wiki/...
#DeleteFacebook
The design lacks the minimum security technology to protect data. I doubt ceph will be around long-term either.
At least some competition now reduces prices.
I have not understood why the ARM TrustZone "worlds" isn't used with a hypervisor. It would provide a very armored attack surface, preventing malware in one VM from trying to jump to another. It also would be useful for stuff the OS wants to protect (user credentials to guard against a pass the hash attack).
> 2) Cost of running the CPU in heat and power is *significantly* less than intel/amd
ONLY true when the ARM's performance is significantly less than Intel/AMD as well. Beef an ARM up to i9 specs, and it's going to burn as much power and throw off as much total heat AS an i9 with identical raw performance.
It's like LED lighting. A single LED might throw off light with just milliwatts of power... but crank it up so it's throwing off EXACTLY the same amount of light as a 100-watt halogen lightbulb (measured from every direction), with color fidelity that's at least as good as that 100-watt halogen bulb (none of this "80+ CRI" shit, or even "92+ CRI with weak R9"), and it's going to CONSUME at least 70-80 watts and throw off almost as much heat AS the original incandescent bulb. Because the only way to get deep, saturated reds without making the light appear 'pink' is to crank up the near-infrared output (which stimulates your 'long' cones, without bleeding into green/blue territory and desaturating it). And even if you settle for lower-quality light, the power consumption is no better than fluorescent bulbs, because a "white" LED basically IS a "fluorescent" bulb.
If you want the equivalent of an elderly personal servant or ten thousand army ants instead of a half-dozen deity-like level bosses, ARM might win over Intel/AMD64. Try to scale the army ants TOO much, and you end up wasting most of your effort just trying to keep them coordinated (the current bane of multithreaded programming).
> ARM can be easily scaled to hundreds of cores
And yet, an Android phone with 8+ cores and nominal clock speed of 2GHz+ still can't render a Javascript-heavy web site (like Amazon, Walmart, or Sears) as well as a 15 year old 700MHz Pentium III.
> without having an astronomical price
Scale an ARM-based solution up to the point where it's capable of genuinely matching the performance of an i9, and you'll find that the ARM-based solution is probably quite a bit MORE expensive.
> without requiring a nuclear power station sitting on the desk
Compared to the power and cooling requirements of a Pentium IV with 15kRPM hard drive, an i9 with RTX and SSD is practically a laptop watt-wise. 20 years ago, I literally cut a hole in the wall between my computer room and the hallway so I could put my computer in the hall & pass the cables through the wall to get the heat and noise out of my face.
Well, technically you'd have to say he observes, then opines (literally "to state as one's own opinion"). If all he ever did was observe then we'd never know, would we?
Post may contain irony: discontinue use if experiencing mood swings, nausea or elevated blood pressure.
ARM two issues with IO:
1) DMA is not usually cache coherent
x86 has cache coherent DMA. ARM has many different DMA controller options and while some are DMA coherent, most are not. This means that on output it needs to perform and additional cache flush operation before passing the memory to the DMA controller. On input it needs to first perform a cache flush before passing the memory to the DMA controller, and then after the DMA operation has completed it needs to perform a cache invalidate to clear any prefetched data.
2) ARM has weakly ordered memory
The order operations hit the memory can be different from the order that they occur in the code. This isn't just reordering of the code by optimisation (which all compilers do) but actual reordering of the memory operations in hardware. This means that additional memory barriers may be required in comparison with x86.
This means that ARM processors typically perform IO operations slower than x86.
I will not argue with the servers being cooler. The x86 chips are more complicated than the ARM chips and that means additional power consumption and cooling requirements.
AMD created the x86-64 architecture, and is making inroads with Epyc. AMD also has some RISC-V work in the pipeline. I'm predicting RISC-V will be big: Intel may try to capitalize on ARM thanks to mobile space, and AMD will start shoving RISC-V (no license fees) into processors for Chromebooks and the like, then into servers running Linux for RISC-V or something.
The next Raspberry Pi might be RISC-V. It's been mentioned. Nobody's taking that seriously yet, and they're not suggesting it seriously yet.
AMD beat Intel once doing this. They invented a whole new architecture and killed IA-64.
Support my political activism on Patreon.
Make sure your new CPU has:
Ram.
Power supply.
Cooling.
Networking.
That can be found, that is on sale and that people like.
A CPU is wonderful.
Now make all the other parts that support the CPU. That work for servers 24/7. At a low price.
An OS would be good too. Software?
Domestic spying is now "Benign Information Gathering"
You've obviously never worked at Scaleway then
A key difference between small cheap computers and large expensive ones is at the chipset level. The large expensive ones come with chipsets that support multiples of:
- Memory channels
- Inter-CPU channels
- I/O channels
For example, while the ARM in a laptop might use a similar technology as a datacentre ARM, the datacentre version will need a lot more pins for memory controllers, inter-CPU cache coherency, and I/O.
ONLY true when the ARM's performance is significantly less than Intel/AMD as well.
ARM has historically had more performance per clock than x86 and x86-64; and modern ARM chips run like 2.4GHz at a watt of peak TDP on four cores.
Think about linear character matching ("abc" in "aaabc" -> "a=a, b!=a" -> "a=a, b!=a" -> "a=a, b=b, c=c" -> match) versus Booyer-Moore ("abc" in "aabc" -> "c:a = 3" -> "c=c, b=b, a=a" -> match). Booyer-Moore finds a string--faster with longer search strings--in large amounts of text with few comparisons, thus issues fewer CPU instructions.
CPUs can implement ALUs, decoders, and pipelines to execute the same instruction code in fewer clock cycles. Just like using a different software algorithm, you can use a different hardware approach.
Prefixed instructions and fixed-length instruction sets are core to ARM. Literally every instruction is prefixed. That means where you might compare for one cycle, then jump or not jump on the next cycle, ARM simply jumps or doesn't jump. One fewer cycle.
The decoder doesn't have to deal with figuring out instruction size or the content if it picks an instruction prefixed to only execute if ZF is set, so if you SUB r2, r1 and the result is zero, the next instruction that executes only if ZF is not set is just skipped and the decoder moves on.
Because the CPU will read ahead and cache (preload) the next several instructions (fetches from RAM are slow!), it's technically-possible to block out the next e.g. 10 instructions as IFZ [INSN], and have an ARM CPU internally identify the next several instructions are prefixed IFZ and just skip the instruction pointer ahead that many. Remember: every instruction is exactly one word wide; you don't need to know what the next instruction is to know where the following instruction starts. You don't have to decode the instructions if they won't be executed.
This feature frequently eliminates a large number of comparisons and jumps, trimming down the size of the code body (you'd think variable-length insns would do that, but that usually doesn't work out). More instructions fit into cache, and branch prediction becomes simpler (less power) and more-effective.
ARM also has 30 GPRs. x86-64 has 10 GPRs, plus source/destination/base/count pointer registers that are basically GPRs. A lot happens without using RAM as an intermediate scratch pad.
It's like LED lighting. A single LED might throw off light with just milliwatts of power... but crank it up so it's throwing off EXACTLY the same amount of light as a 100-watt halogen lightbulb (measured from every direction), with color fidelity that's at least as good as that 100-watt halogen bulb (none of this "80+ CRI" shit, or even "92+ CRI with weak R9"), and it's going to CONSUME at least 70-80 watts and throw off almost as much heat AS the original incandescent bulb
Halogen and incandescent bulbs are black-body emitters: much of their light is in the infrared range. LEDs are narrow emitters and use combinations of materials to emit in multiple ranges when providing white light. That means an LED operating on 100 watts of power emits about 80 watts of visible light, while a halogen operating at 100 watts emits about 20 watts of visible light, and an incandescent tungsten-coil bulb emits about 10 watts of visible light.
An LED emitting the same broad-spectrum visible light as a 100-watt halogen would consume 25 watts of power.
Support my political activism on Patreon.
Arm runs in hundreds of million devices, so does Linux. They are niche products with 1000 time more installed products than x86.
Arm will not disappear anytime soon, and Apples desktop Arm rumors indicates it will become even bigger. It will be in more devices in the server room than intel in a few years.
I dunno, my 8-core Galaxy S9+ seems fine with rendering websites. After all, that's what I'm posting from and do 98% of my browsing from.
As to the Pentium IV, definitely! Perhaps 7 years ago, I was given a Pentium IV tower, and I threw it in a corner as a headless media server. It only lasted the first month, because of the $40 spike in my electric bill.
Whereas my bench at home has 20 Cortex-A53 cores on it, and the kill-a-watt doesn't creep past 65 W, including 1 TB external RAID, 2 USB hubs (why not?), el cheapo RF keyboard/mouse dongle, two JTAG debuggers, a USB-UART dongle, TV, and HDMI matrix switch. Parallel make and distcc are pretty speedy when you run them that way. (Bit of a personal side project, but you get the idea.)
Both Juniper and Palo Alto use Cavium ARM processors in their hardware, usually for management plane tasks (FPGAs and ASICs do the heavy traffic processing on high end units). And ARM SoCs are popular for switches and routers where raw compute power isn't necessary. Certainly Cisco is the only one willing to stick with low end, neglected Intel Atom offerings even after the Nexus 9k, ISR 4k, And ASA 55x6 series got bit by defective Atom C2000s (sorry bro, your $55k switch just died because of a $41 CPU).
So ARM is great anytime you don't care about CPU processing power, but still want to move data -- storage appliances and network. Which is odd given that in the mobile space the few Atom x86 Android phones to reach the market had lesser raw CPU benchmarks than their ARM contemporaries at the time, yet in actual usage felt much smoother because of wider / faster buses and superior throttling (Had a Zenfone 2 with the Atom and it's still smoother than a lot of Snapdragon 6xx midrange phones).
I need suggestions for commercially made ARM systems that will work in temperature ranges from -35F to 140F (-37C to 60C) for an engineering project. These things are going to be in metal boxes on the side of Texas Highways.
Right now we've got some very impressive Intel systems, but those are in air-conditioned boxes, I'm looking for something that can survive a non-air-conditioned box. When I look for ARM stuff I find a lot of industrial boards, but not a lot of pre-made industrial systems, especially in wide-temperature range devices like I'm looking for.
Any help out there?
I'm trying to talk my company into getting me a Pine64 and the aluminum case for toying around and development purposes, but I don't think that will work for actual field use.
The preceding post was not a Slashvertisement.
Intel/AMD use less power than ARM when it comes to compute loads. ARM is great for high idle and they're claiming IO loads. I wouldn't mind a many core ARM file server or router, but not an app server where the server is under high load.
I spend many millions of dollars every month on new data center hardware. You have not provided a single reason for me, as a mid-sized hardware buyer, to go with ARM.
I do not give a flying fuck what the cpu costs, how much power it uses, the insurrection set, etc. I sure as hell do not care about the political shit on your list.
I buy power in bulk. I buy space in bulk. I buy servers in bulk. The cpu is a single component in a larger system of memory, drives, etc. It is not as critical to me as it is to some home gamer hobbyist who is grinding benchmark scores.
I care only about consistency across thousands of servers which makes supporting them easier. Plus or minus a few bucks for the cpu? Not on my list. Power? I already keep the coal plant nearby quite profitable. My costs are still trivial next to the zillions of dollars I make off having an easy to maintain farm of servers without any bullshit. I buy Intel. A lot of Intel. Why? Because it is the smart thing to do. It makes business sense. There is no reason to buy AMD and sure as hell no reason to even consider ARM or anything else at all, ever. Zero reason.
Tl;dr: not a single one of your 7 reasons to buy ARM in data center is meaningful to me as the guy who buys the stuff that fills data centers.
Trains are dying out because of cars. But as long as there's a heavy load to move, trains will exist. FreeBSD is primarily used in infrastructure roles. Plenty of anecdotes of system admins for large datacenters where they switched to FreeBSD because Linux kept crashing under high loads. FreeBSD also holds a near monopoly on publicly funded research and RFCs, both of which need to be open for all uses, which the GPL does not allow.
Linux is great, but it does have its short-comings when you focus on sys-admin friendliness and engineered designs.
5) Apple are using it for "tablets pro" with the same old kiddy-mode iOS that you can't count on to copy an email attachment to a USB drive or whatever, and are keeping all the hardware to themselves
6) If someone is serious at competing with Intel and AMD they will do "price tiering" or at least binning of chips so you can buy the 24-core cheaper than the 32-core etc. even though it's the same chip.
But what I wanted to answer you before your numbered points :
Most of the powerful ARM chips i.e. all those in phones and ipads are built on the SoC model, everything close together which is a major source of power saving by the way. They support a standard of LPDDR memory and all have a memory ceiling like 8GB and lower on older chips. With really cutting edge memory you can get 12GB like with Samsung memory on the Samsung S10 Plus.
A celeron based on desktop architecture may now support up to 128GB!
ARM may support real external DRAM and peripherals but this will chip at their milliwatt and dollar advantages.
1) The CPUs don't exist. AMD is barely making money, if you're thinking there's a viable CPU business making desktop-class chips and undercutting AMD you are wrong.
2) [citation needed]
3) I'll give you that on SIMD. See, e.g., VCVTTPD2UDQ. What else?
4) Oh, great, so now you're spending tens of millions on making your own chips AND you have to support a bunch of open-source projects to maintain the proprietary extensions for your chip. get real.
5) Uh, no they are not. Apple is using it for mobile devices. They might be PLANNING to use it for laptops but that means fuck-all for anyone not buying an Apple laptop because they are not selling their designs to other vendors.
6) See 1.
7) You have a very stupid view of how innovation happens.
Wow, some amazing claims, but you failed to do your research and added hyperbole
Here's why you're wrong:
https://www.extremetech.com/mobile/279988-apples-ipad-pro-a12x-nearly-matches-top-end-x86-cpus-in-geekbench
It's already happened.
It can only be scaled so well because it trades throughput for latency when it comes to cross-core communications. This makes any concurrent workload where there is lots of shared mutable data very slow or requires a completely different design. Message passing could be done, but it does use more memory, which means more cache is used and more memory bandwidth is used. Every design has its pro and cons.
Canes are trash. Go Noles!
ONLY true when the ARM's performance is significantly less than Intel/AMD as well. Beef an ARM up to i9 specs, and it's going to burn as much power and throw off as much total heat AS an i9 with identical raw performance.
It's actually worse. You can't really have high performance and low power usage at the same time, it's a trade off. CPUs have gotten better, but the transistors themselves become more leaky when you design for higher performance, among other architectural trade offs made to increase performance. And pushing transistors optimized for low power to be faster will consume more power than transistors optimized for high performance. The gap is shrinking (pun), but it's still a practical difference.
NXP i.MX 8M Mini Applications Processor Evaluation Board
-40 C to +105 C good enough for you?
Also, this SoC: NXP Semiconductors MIMX8MM6DVTLZAA
From the Pine64 FAQ:
What is Pine A64 operating temperature?
The Pine A64 operating temperature qualified range from 0C to 70C.
You'll need active heating and cooling of some sort to go with a Pine64. I think NXP's i.MX line is more focused on what you're looking for.
still can't render a Javascript-heavy web site (like Amazon, Walmart, or Sears) as well as a 15 year old 700MHz Pentium III.
When was the last time you used a 15 year old 700MHz Pentium III? Eight years ago?
Ezekiel 23:20
Will you asshats please STOP trying to have shared mutable data?
FFS, go look up Amdahl's Law and consider why it's a bluntly stupid thing to propose if you want to process large workloads concurrently.
Now that top one there might be exactly what I need. I'm going to look closer and send that to the engineer and see if they'll get me one to toy with. Thank you.
The preceding post was not a Slashvertisement.
Not being able to buy the CPU you want because Intel can't make enough could be a reason for some.
The "political shit" as far as competition to a degree probably SHOULD be to at least some concern to you, because being beholden to a single supplier certainly has risks.
But if you are mid-sized it seems highly unlikely that you'd want to buy "bleeding edge", that is not really surprising.
I guess there would be little reason for you to switch, unless some Arm vendor comes up with a feature that would make your life A LOT easier and Intel just isn't interested because there is too little money in it. Don't think anything like that is in sight so far.
At home I run 2 servers on "shitty computers" with low power requirements. One is an old Atom, and another is a more recent Pentium Silver which runs a max 10 watts. The Silver runs my storage device, and it's far more powerful than I'd ever need so I don't need a "real CPU", and with HW acceleration is powerful enough to do trans-coding. The Atom runs my VoIP server, and also doesn't need any more HP. I'd never put them on ARM for the reasons Linus outlines.
Why would anyone want to run storage on ARM? I do have some non-intel machines, but they're largely all cost constrained. A router running openwrt, and a raspberry pi. The only reason these machines run on MIPS is cost of needing to be under $100. A storage server running CEPH doesn't have this constraint. Does ARM have some special IO abilities that a 10 watt TDP pentium silver doesn't?
Trains are dying out because of cars.
In Europe we build more and more (high speed) railway tracks.
Cost free eBook I read (by iBook/Kobo/Amazon/ObookO/Gutenberg etc.): "The Green Odyssey" by Philip Jose Farmer.
Such a blind rage of ignorance. It's like you think all work loads could be restructured to work well on a GPU.
Imagine a language with all immutable data and no side effects. It would be the purest of all languages and completely useless. You can minimize the amount of things that need to be shared, but you can never get rid of them all. I am in no way trying to have shared data.
Concurrency is not difficult, in user space. I make no claims about kernel space. I have always found all things concurrency to be intuitively solved. ARM CPUs require a different strategy for optimizations. What is a low latency atomic CAS on x86 that has never been a source of contention for me is some horribly slow operation on ARM that Amdahl's Law strikes you down with. You can change your design, but in some cases it's a re-architecture, depending on how performance sensitive and specialized the system is. This can make it expensive to transition to ARM. Even once you're on ARM, it's bad for any situations where the mutable data wasn't a point of contention on x86.
You have to understand where I am coming from. I am known in my company to be great at optimizations. I can generally make someone else's code 10x-100x faster without concurrency. Even when I'm complaining about something being 1/2 as fast as it could be, it's still a magnitude faster than what my peers are making.
What really put me on the map is when our DBA, who is very active in the SQL community and considered a senior member in some circles, and programmer had been working on a process that was taking 30 hours to run and the customer was complaining. I read over the several pages of code in 15 minutes and gave some recommendations before they started. After a month or tracing and testing, they were able to get it down to around 15 hours. I looked over the changes and noticed they did not make a single one that I recommended. At the time 15 hours was "good enough". Fast forward a year later, and the amount of data doubled, causing the run time to double. Back up to 30 hours. I just opened up the project, made all of the changes that I recommended in a matter of 1-2 hours, kicked off the job to test it, and it finished in 1 hour with the exact same results. That was the only test I did, and it passed the first time, and was 30x faster.
I do the same kind of crap with concurrency. All intuition.
I forgot to mention that the performance changes that I made were almost all changes that they say to "never" do because you'll make it slower. Funny how thing that should "never" be done because of some objectively measurable negative behavior can create a objective measurable positive result in the same behavior in some extreme corner cases. "Never" just means "you" will probably do it wrong. Simple solution, don't do it wrong.
You either try to conquer the market or conquer a niche in the market. Xeons clearly have the majority of the market. Therefore ARM finds niches where it's competitive and thrives and those segments. Linus is right for the moment, but that doesn't mean ARM won't find it's place and be competitive. I struggle to figure out why this is such a challenging concept to understand.
AMD beat Intel once doing this. They invented a whole new architecture and killed IA-64.
I think you've got that a bit backwards. Intel invented a whole new architecture with IA-64, and AMD killed it by bolting a 64-bit kludge on to Intel's existing x86 architecture.
Thank you for the write up. Are we on the right site? Did we travel back in time?
aRTee
Don't know about that guy but I use a Pentium 120Mhz every fucking day. This is what happens when you stop funding governement infrastructure. We also still use 1992 model Sun Sparcs, an AS400 and we retired our last PDP11 3 years ago. We are currently replacing a late 70s Westinghouse DPU but we have 4 or 5 more in a standby mothballed status. And guess what? This is what makes your military planes "the best in the world". LOL.
The real problem is that ARM is currently nothing even approaching competitive on a per-core performance metric with Intel.
You need 20 A53 cores to match the performance of 4 Xeon cores.
In some workloads, this doesn't matter, because they can be scaled well.
In a lot of workloads, it simply does matter.
I administrate around 150 servers, and we run 7 datacenters.
I already avoid the slower I know a lot of armchair computing experts like to claim that it is, but I'm sorry. The reality on the ground is that it is not. That's why we're not using AMD, and we're not using ARM. Though I promise you- we look forward to being able to some day.
*avoid the slower clocked Xeons.
Sigh.
Should have been:
I already avoid the slower clocked Xeons. Aggregate performance is simply not comparable in most workloads with highly disparate per-core performance. I know a lot of armchair computing experts like to claim that it is, but I'm sorry. The reality on the ground is that it is not. That's why we're not using AMD, and we're not using ARM. Though I promise you- we look forward to being able to some day.
The real answer is RISC.
SPARC, POWER and older brother RS/6000, MIPS*, and ARM's granddaddy DEC Alpha dominated the data center space for decades. It was the cost/performance ratio of the far less efficient Intel architectures that let them win in this space.
We could easily reduce data center footprint by 1/3 by using RISC, but that's not how a free market works.
*I have installed huge SGI servers
Kriston
There are some hypervisors that use Trustzone for various things; mostly commercial and relatively low profile(Sierraware has one, as does Mentor Graphics, and there are a number of other projects and research papers; no personal experience with them). What's less common is a hypervisor used as we are accustomed to(just carving a big system up into a bunch of smaller VMs for resource efficiency and abstraction purposes); and the prevailing use seems to be adding features that stock trustzone doesn't have, without expanding the size of the code base that has to be explicitly trusted too much; by going from the basic 'untrusted general purpose OS'/'minimal trusted world that does hard real time or DRM stuff you don't want the general purpose OS messing with' to moving the hypervisor into the trusted world and running the big feature rich and buggy OS as one guest and one or more special-purpose reasonably highly trusted VMs that are protected from the untrusted VM but don't have to be brought fully into the trusted world.
I'm not sure if this is because these products have a history in being sold for embedded systems, set top boxes with DRM/conditional access requirements, and the like; or whether it's because ARM systems large enough to be worth chopping up into multiple general-purpose VMs are still pretty uncommon; or some of both.
In principle the trustzone features are applicable to the protection of general purpose hypervisors or sensitive credential handling and such; but aside from immaturity of software aimed at those uses cases there is also the issue that it is not typically the case that the device administrator is intended to modify the behavior of what goes on within the trusted side; again, likely a legacy of use in cases where it's being used to keep the user from messing with it. If you are ordering enough units the vendor will presumably do whatever you tell them to; but as-is expecting to modify trustzone behavior is not unlike expecting to edit an x86 board's UEFI to change SMM behavior. Not always impossible if the assorted checks are buggy; but deeply not encouraged.
Blah, blah, blah!
"ARM will win in the datacenter for ..."
All those incredible, invincible, unassailable arguments are negated only by the total failure of ARM in the datacenter. For 2 decades now, roughly. You are a genius no doubt about it!
Maybe a decade or two ago, when Linux was more of a hobby than an enterprise grade, professional product like FreeBSD was. Fast forward to 2019, the roles are pretty much reversed. FreeBSD today is interesting in two aspects of computing: storage (due to ZFS being native) and networking (due to quite capable network stack). But... ZFS on FreeBSD is being rebased to ZFS on Linux (because Delphix said nope to FreeBSD), so that advantage is lost. And Linux network stack is gaining in performance and functionality very fast. ebpf is comparable to dtrace, and with some intentions (google through LWN.net) to utilize bpf to filter network packets, FreeBSD will have exactly zero advantage. In fact the primary reason Netflix uses FreeBSD for edge appliances (but not for the main applications!) is that the developers were/are very familiar with it and are able to actually modify the kernel and the OS to suit their needs. Because, Netflix runs NetflixBSD, a fork of FreeBSD. None of the additions and modifications they did are available upstream to common mortals downloading an ISO off of freebsd.org.
Intel proved long ago that they could create a modern processor with an engine that was as powerful as any RISC chip, yet have that processor follow the X86 instruction set. AMD proved this wasn't a fluke. The AMD64 architecture is fine; the advantages and disadvantages of ARM vs AMD64 are due to implementation details.
You can ALWAYS find one algorithm or another that encodes better for one chip vs another. That's why SPECint and SPECfp exist. They find the most complex algorithms and most complex problems and test against them. Nobody tests based on LINpack or DHRYstone or WHETstone. They test on real algorithms that have proven difficult to optimize away.
Right now Intel leads in all SPECint measurements. ARM doesn't even publish results because they're all too busy making phones. But, they do have the economy of scale to tackle that at some point if they want to...
More GPRs are overrated for most general loads. When AMD was designing x86-64, they tried many different numbers of registers and had knowledgeable people writing compilers to take advantage of those extra registers. Very rarely did it result in more performance, and overall resulted in less performance because more registers exposed, the more state has to be captured during context switches and what-not. x86-64 implementations generally have hundreds of virtual registers allowing most of the benefit of more registers without the overhead of having to push more to the stack.
More registers is objectively worse in most situations, but better in a very limited set of cases.
About 4 years ago, I dusted off an old Compaq Armada laptop (700MHz Pentium III, 512mb ram) and tried using it with a minimalist Linux distro. The performance of Chrome or Firefox with Amazon.com and Walmart.com was SLIGHTLY better than the performance of the same two web sites with my then-new Galaxy Note 4 (all using wifi, so the mobile network quality never entered into the equation).
The Pentium III is a great reference point, because its zenith (the 1.4GHz Pentium III Xeon) marked the point when Intel temporarily ran into a brick wall and spent the better part of a decade unable to meaningfully improve its single-thread performance. During that period, it was struggling to compete in ARM's home court, and ARM was scoring occasional victories. Compared to Intel's lower-end CPUs, like the Atom family (which, as I understand it, were basically just die-shrunk Celeron IV processors with power management bolted on, the same way the Pentium M family were basically die-shrunk Pentium III Xeons with power management bolted on), ARM looked fairly respectable.
Seriously, I challenge anyone to name ANY ARM-based CPU or SoC that satisfies the following requirements:
* Absolute single-thread performance as good as, or better than, an Intel i9 9900K or 9900KF
* Multi-thread performance as good as, or better than, an Intel i9 9900K[F], the only constraint being that all of the CPU cores have to be on the same slab of silicon inside the same packaged chip.
* The ARM chip has to be available in single quantities, to anyone with a valid credit card, with at most a 7-day lead time, for less than what it would cost our same hypothetical consumer with a credit card to purchase an i9 9900K[F].
* The ARM chip has to be usable in a system that a reasonably sophisticated end user (who nevertheless is NOT Linus Torvalds or doing it as a work-related project with the resources of his employer at his disposal) could install Windows or Linux upon and run normal software directly (emulation doesn't count).
Simply put, such a hypothetical ARM chip IS, in fact, hypothetical. It's fantasy. Even if you managed to solve the chip-availability problem, you'd literally have to be a company with the resources of Qualcomm or Samsung to get anywhere close to being capable of obtaining a genuinely i9-competitive chip and using it in a system capable of running Windows or Linux, because basically NOTHING in ARM-land is standardized. Everything near the highest-performance bleeding edge of ARM ends up being coture and proprietary, and without living at that razor's edge, you can forget about getting performance that's any better than what you'd get from a mediocre, middle-of-the-road x86/AMD64 solution.
ARM is for building cheap, disposable army ants and pack mules. Intel architecture is for building omnipotent combat robots capable of leveling the countryside while its AI ponders early 20th-century human philosophy and its structural contradictions. There's a huge gray area between the two extremes, but the fact is, if you want extreme performance at a halfway-sane price using off the shelf technology and products that aren't one-off custom prototypes, you have one real choice: x86/AMD64.
If you want a new laptop whose designers wished they could have made a sealed slab of translucent Lucite with 60 hour battery life... even if it's slow, and sucks heinously at running any kind of normal Windows or Linux desktop application... by all means, go ARM. If you want 90fps realtime raytracing for your Oculus Rift, pretty much ANY solution involving ARM-based COTS hardware is guaranteed to leave you disappointed for the conceivable future.