ARM Readies Cores For 64-Bit Computing
snydeq writes "ARM Holdings will unveil new plans for processing cores that support 64-bit computing within the next few weeks, and has already shown samples at private viewings, InfoWorld reports. ARM's move to put out a 64-bit processing core will give its partners more options to design products for more markets, including servers, the source said. The next ARM Cortex processor to be unveiled will support 64-bit computing. An announcement of the processor could come as early as next week, and may provide further evidence of a collision course with Intel."
I know folks think it's 'overkill' to have 64-bit CPUs in portable devices, but consider that the -entirety- of storage and RAM can be mmapped in the 64-bit address space... That opens up a lot of options for stuff like putting entire applications to sleep and instantly getting them back, distributing one-time-use applications that are already running, sharing a running app with another person and syncing the whole instance (not just a data file) over the Internet, and other cool futuristic stuff.
I'm wondering when the first server/desktop OS is going to come out that realizes this and starts to merge the 'RAM' and 'Storage' into one 64-bit long field of 'fast' and 'slow' storage. Say goodbye to Swap, and antiquated concepts like 'booting up' and 'partitions'.
"Sometimes, I think Trent just needs a cup of hot chocolate and a blankie." -Tori Amos on Nine Inch Nails
You don't see the use?
low-latency bare-metal fileservers that consume only a few watts, but can natively handle huge filesystems and live encryption? It's a lot easier to handle a multi-TB storage array when you're 64-bit native, same for encryption. Look at Linux benchmarks for 32 vs 64-bit filesystem and OpenSSH performance.
Do you have any idea how many $4,000 Intel Xeon boxes basically sit and do nothing all day at the average enterprise? If you can put Linux on these beasties, you could have a cheap and inexpensive place for projects to start, if load ever kills the 2GHz ARM blade, you can migrate the app over to an Intel VM or bare metal. I'll bet 80% of projects never leave the ARM boxes, though.
My whole department (currently seven bare-metal Intel servers and five VMs) could run entirely off of a few ARM boxes running Linux. It would probably save an employees'-worth of power, cooling, upkeep, and upgrade costs every year.
"Sometimes, I think Trent just needs a cup of hot chocolate and a blankie." -Tori Amos on Nine Inch Nails
Would be the most exciting revolution to watch. Since it has a totally different design it changes the parameters of how hardware end products can be built.
As ARM cores are so simple and ARM Holding does not have their own fabs, anyone could come up with their own optimized ARM-compatible CPUs. It's one of those moments when the right economics and the right technology could fuse together and change stuff.
Can you please explain the advantage of ARM over X86 in the server room because this one has me scratching my head. While I'm all for different arches (I have a PPC G3 Mac just so I could play with non x86) I thought the whole point of ARM was it was super low power for mobile devices? while I'm sure cutting down power usage in the server room would not be a BAD thing, considering how much software, both for Windows AND Linux, that isn't for ARM based CPUs I just don't get what the advantage of this would be over say a Bobcat, Nano, or Atom based solution.
Now in mobile I get it, as you can make a cheap iPad knockoff that can get 8+ hours of battery life, but in servers? Maybe there is a use case I don't know of, but when I was setting up servers while power was a consideration it certainly wasn't looked at as a priority over the performance in server roles. How well does ARM handle large amounts of users? How well does it scale with increased demands? While I wish them all the best I just haven't seen a screaming need for these, not when you already have Atom and Nano and are about to have Bobcat and Bulldozer (which from the looks of it will be nice as it has a well built GPU in the Bobcat and Bulldozer so AMD stream coding could be used) all in that same market. What am I missing here?
ACs don't waste your time replying, your posts are never seen by me.
Well, considering that somewhere between 60-90% of the desktop marked in reality does not care what their computer is running, so long their got access to a browser and facebook and in worst case a office suit on the side for minor work, it would not really have mattered.
The only real problem is not Windows, it is getting the computers into the mainstream stores to be sold alongsides the Macbooks and the various normal Windows OEM solutions. Just getting it there would mean instant markedshare over night, because only a minority is application bound in reality.
Look at Linux benchmarks for 32 vs 64-bit filesystem and OpenSSH performance
What benchmarks are you looking at? If you're comparing x86 to x86-64, then you are going to get some very misleading numbers. In addition to the increased address space, x86-64 gives:
Offsetting this is the fact that all pointers are now twice as big, which means that you use more instruction cache. On a more sane architecture, such as SPARC, PowerPC, or MIPS, you get none of these advantages (or, rather, removal of disadvantages), so 64-bit code generally runs slightly slower. The only reason to compile in 64-bit mode on these architectures is if you want more than 4GB of virtual address space in a process.
The ARM Cortex A15 supports 40-bit physical addresses, allowing up to 1TB of physical memory to be addressed. Probably not going to be enough for everyone forever, but definitely a lot more than you'll find in a typical server for the next couple of years. It only supports 32-bit virtual addresses, so you are limited to 4GB per process, but that's not a serious limitation for most people.
ARM already has 16 GPRs, so you can use them in pairs and have 8 registers for 64-bit operations. Not quite as many as x86-64, but four times as many as x86, so even that isn't much of an advantage. All of the other advantages that x86-64 has over x86, ARM has already.
I am TheRaven on Soylent News
Arm servers make sense in two places: the small and the giant. They fall down in the medium and large space.
In other words, my personal server currently runs a "low power" AMD Sempron. The CPU uses something like 40 Watts, and it is plenty fast enough for my needs. It makes my RAID work, and it serves stuff over NFS and Samba. There are only ever a few clients, and the CPU spends most days nearly idle. It's a small box with a small workload, and it would work just fine with an ARM CPU instead of an x86. (Assuming the hypothetical ARM system could physically connect my external RAID enclosure.) More CPU wouldn't hurt, and it would occasionally make a few things faster, but mostly putting a Xeon in this box would just make it louder.
In the realm of giant workloads, you have jobs that can't possibly be done by a single machine, no matter the budget. You are looking at needing many hundreds of even the biggest machines you can get. If you have a job that parallelizes that well, doing it with 1000 x86 boxes or 4000 ARM boxes isn't that big of a difference. If the ARM boxes are smaller, cheaper, and lower power enough that it outweighs the fact that you need more of them, then it would be crazy to go with whizzy Xeon boxes instead of Arm. Buzzword enthusiasts will throw labels like "Cloud scale computing" at this sort of thing.
Where ARM falls down on the job is anything that can be done by a 4 core Xeon, up to a handful of 32 Core Xeons. That's a big chunk of what we normally think of as the Server market. ARM doesn't compete very well in this space. When people say that ARM is a ridiculous idea for servers, this middle segment of the market is generally what they are thinking of. A cluster of a dozen little ARM boxes competes rather poorly with a single machine with four Xeon sockets in terms of management overhead, and the amount of effort required to parallelise workloads, and the amount of bandwidth between distant cores. If you have an application that has an expensive per-machine license, that speaks in favor of a single big machine, etc.
So, small office that needs a little NAS server to stash under the secretary's desk? ARM can pwn the market. Giant research institution with some parallelisable code trying to figure out how molecules do something naughty during supernovas? ARM can pwn the market. "Enterprise" level IT in a smallish, but uncrowded data center with adequate, already provisioned power and cooling... ARM may well be suitable in some cases, but it's certianly not an easy sell.
And, relatively common cell phones have 1 GB of RAM. In two years or so, a cell phone with 4 GB of RAM will seem perfectly reasonable. At that point, 64 bit ARM stops being a data center/desktop issue, and is simply required to hold onto the existing ARM core market.
It should in theory scale better than x86-64 anyhow, and the performance per watt is quite superior, so yes, it has a major place in the server room.
One of the more amusing blog entries from Sun engineers was a discussion of the amount of energy needed to completely fill a ZFS file system. A 128-bit address space isn't just optimistically big, it's "freaking huge!"
http://blogs.sun.com/bonwick/entry/128_bit_storage_are_you
In large datacenters, power and cooling costs have become a significant part of the TCO. For smaller server rooms x86 compatibility is probably more important.
This isn't like the 16->32 bit transition where it quickly became apparent that the benefits were large enough and the costs both small enough and rapidly decreasing that all but the smallest microcontrollers could benefit from both the switch and the economies of scale. 64-bit pointers help only in select situations, they come at a large cost, and as fabs start reaching the atomic scale we're much less confident that Moore's Law will decrease those costs to the level of irrelevance anytime soon.
Most uses don't need >4 gigabytes of RAM, and it takes extra memory to compensate for huge pointers. Cache pressure increases, causing a performance drop. Sure, often x86-64 code beats 32-bit x86 code, but that's mostly because x86-64 adds registers on a very register-constrained architecture and partly because of wider integer and FP units. 64-bit addressing is usually a drag, and it's the addressing that makes a CPU "64-bit". ARM doesn't have a similar register constraint problem, and the cost of 64-bit pointers would be especially obvious in the mobile space, where cache is more constrained- one of the most important things ARM has done to increase performance in recent years was Thumb mode i.e. 16-bit instructions, decreasing cache pressure.
Most of those who do need more than 4GB don't need more than 4G of virtual address space for a single process, in which case having the OS use 64-bit addressing while apps use 32-bit pointers is a performance boon. The ideal for x86 (which nobody seems to have tried) would be to have x86-64 instructions and registers available to programs but have the programs use 32-bit pointers, as noted by no less than Don Knuth:
It's funny to continually hear people clamoring for native 64-bit versions of their applications when that often will just slow things down. One notable instance: Sun/Oracle have told people all along not to use a 64-bit JVM unless they really need a single JVM instance to use more than 4GB of memory, and the pointer compression scheme they use for the 64-bit JVM is vital to keeping a reasonable level of performance with today's systems.
Funny but in 1990 I bet the said the same thing about Intel.
In any office of say 50 or so people a 64 bit ARM would probably do just fine. NAS and SANs in bigger installations would probably also run very well on a 64 bit ARM. And then one has to wonder just how many ARM cores might fit on a die?
ARM is a much more modern ISA than X86 so it will be interesting to see just where it goes. Trust me if you had told anyone in 1982 that someday there would be an X86 that was faster per clock cycle than a Cray1, ran with a multi ghz clock, and had a 64 bit address space they would have locked you in a rubber room.
See my blog http://ilovecookes.blogspot.com/ for light hearted technical information.
Well, that is interesting and all, just wondering about something a bit more modern. We have 1 ghz ARM in cellphones now,and larger coming, etc, which is enough with sufficient RAM to work as a modern desktop for most uses. I currently still run an old slow single core, works fine, but if I could get comparable performance at only 1/10th the electricity use and eliminate all the fans...see what I mean? Way back in 2008 canonical announced serious ARM support and so on, but still no machines to buy from anyplace. I contemplated using a high end cellphone, but none of them have full keyboard and mice support and are beastly expensive and you can't get any with at least two gigs of RAM, which is defacto about the tipping point for a desktop today between "works" and "tear your hair out".
I mean really, the chips themselves are wicked cheap compared to intel or amd, so where is a plain vanilla ARM based normal form factor desktop, ATX or miniITX or like that? Seems like they could be making a good enough desktop for some serious cost reductions and hit the niche that fits. Now I have an old VIA miniITX board but dang they require super expensive RAM (via specific, generic pc133 stuff do NOT work) just to get it to one full gig, and the 256 megs that I have, just don't cut it. "Good enough" quiet, cheap to run and cheap is what I am after and it sure seems like an ARM solution would fit, just I can't find one, and I have looked now for two years off and on. I don't want a teeny netbook, I want a bog standard desktop cheap machine, just with ARM instead of AMD or Intel.
You can get a $50 zipit z2 and run debian arm on that. Fits in the palm of your hand and does all that.
Yes, emulation is an option, but I don't think that ARM running x86 emulation layer will be competitive with native x86 CPUs. Didn't this happen to Itanium? Slow x86 performance and AMD's x86-64 resulted in virtually zero market for Itanium.
...considering how much software, both for Windows AND Linux, that isn't for ARM based CPUs...
CPU architecture doesn't really matter with FOSS - once you have a working compiler, you just compile everything from source. Alright, you need some arch-specific work in the kernel and a few other places too. But by the time you get to end-user applications, all of that is long gone. So I would reply with "almost all Linux software already is for ARM-based CPUs". Or MIPS. Or POWER/PowerPC. Or whatever architecture you want.
And one advantage that ARM's low power/heat could bring is high density. Take a look at the Gumstix boards. Now imagine a "blade server" board with 16 or more processors crammed onto one board. You could easily get at least a few hundred CPU's in a 19 inch rack, with each CPU draining less than a watt of power. Now I'm not really sure what could be done with such a system - either do everything over the network (NFS or ATAoE), or equip each CPU with a good lump of flash storage for data and programs. But it would draw very little power and is something to think about.
There are a lot of boxes out there doing nothing but serving files and printers, if ARM did start to be popular you can be sure that MS would be sure not to lose that business. And then, once you have the things installed, it suddenly makes sense to write some of your new programs to run on them...
To be fair, that doesn't counter his argument, amd64 has more registers than i386 and they do make a big difference. Repeat the tests with 32-bit pointers and 64 bit registers and then get back to us.
As of today, the method he mentions would probably provide a bit better performance, assuming the processor optimizations didn't break when their expectations weren't met.
However, I think it is very short-sighted to miss the fact that about the only thing increasing these days is memory and that apps tend to grab all the address space they can get. By 2050 I can see machines with 1TB ram, but I can't see apps keeping themselves under 0xFFFFFFFF.
Furthermore, thanks to ASLR, which is a feature available now on most OSes, address space fragmentation is a problem today even for programs well under the 4Gb mark. The future is 64:64. 32 bit architectures are already dead, they just haven't realized it yet.
10 little-endian boys went out to dine, a big-endian carp ate one, and then there were -246.
But by the time you get to end-user applications, all of that is long gone.
C and the C like bits of C++ are a very leaky abstraction.
Take unaligned accesses for example. Some architectures will just quietly fix them up. Some will terminate your app with a sigbus and some will return bogus results (with older arm chips arm did the last of these, with modern arm chips the kernel can trap it but iirc it doesn't by default).
And then there is the fun of va_list. on x86 it's a simple pointer, on other architectures it's a more comlex structure and this can cause problems if you try to use it in certain ways.
As someone who has watched debian rc bugs architecture specific failures are not at all unusual. Sometimes it is actual bugs in the toolchain, other times it's portablity issues in the user code.
For common FOSS these issues have already been largely fixed (at least to the extent that they broke something obvious) because of the work of projects like debian but if you have custom C or C++ code written by code monkeys then you have a problem.
And if your custom code is in java you potentially have a much bigger problem. There is an arm port of openjdk but it's rather immature at the moment. There is gcj too but don't expect good compatibility there.
note: i'm known as plugwash most places but i screwd up registering that here somehow in the past and now can't register
It's not as slow on x86, but it's sufficiently slow that you want to avoid it. Basically, no one is the kind of horrible pointer casting that generates this kind of access on x86 in anything performance critical.
It's also worth noting that compilers will never generate unaligned loads if they can avoid it. If the compiler can tell that the load might be unaligned, it will generate a pair of load-mask-shift sequences and xor the results together (the shift is free on ARM, so this is not as expensive as it could be). It's only if you do some evil pointer arithmetic and casting that makes the compiler think it's got an aligned pointer when it actually has an unaligned one that you get problems.
I am TheRaven on Soylent News
I've got eight 8-bit AVRs and duct tape right here. That's almost the same thing.