ARM Chips Designed For 480-Core Servers

Going to be expensive! by ikarys · 2011-03-13 21:27 · Score: 5, Funny

It'll likely cost an ARM and a leg.

Re:Going to be expensive! by symbolset · 2011-03-13 21:59 · Score: 1

No.

--
Help stamp out iliturcy.
Re:Going to be expensive! by lwsimon · 2011-03-13 22:49 · Score: 1

Nice. I was thinking "My God... It's full of cores!"

--
Learn about Photography Basics.
Re:Going to be expensive! by SimonTheSoundMan · 2011-03-13 23:34 · Score: 1

Mmm, reminds me of the prototype card for Acorn computers that had 32, 600MHz ARM processors. They never released an estimated price though. This was back in the early year 2000's so would have been incredibly expensive. Cortex A9's are now in mass production, not in the hundreds/low thousands that Acorn used to make, so might be cheaper than you actually think.
Re:Going to be expensive! by fuzzyfuzzyfungus · 2011-03-14 00:00 · Score: 2

I suspect that cost will largely boil down to the "fabric", type unspecified, and whatever the "because we can" premium for this device happens to be.

Since the A9s are in mass production, and have some vendor competition, they should be reasonably cheap, and of basically knowable price; but, depending on what sort of interconnect this thing has, you could end up paying handsomely for that. "Basically ethernet; but cut down to handle short signal paths over known PCBs" shouldn't be too bad; but if it is some sort of custom NUMA unified memory thing, bend over and open your checkbook...
Re:Going to be expensive! by chrishillman · 2011-03-14 00:20 · Score: 1

I am dying.. you have killed me. Way too funny for a Monday morning. Now I am at work literally laughing out loud and I can't explain what is funny to anyone who will get it... I am dead inside, killed by your humorous post...
Re:Going to be expensive! by 605dave · 2011-03-14 01:23 · Score: 1

Wish I could mod you up. High-larious.

--
Be kind, for everyone you meet is fighting a difficult battle. - Plato
Re:Going to be expensive! by lwsimon · 2011-03-14 08:17 · Score: 1

Your approval is enough for me. Consider me modded appropriately.

--
Learn about Photography Basics.
Re:Going to be expensive! by badkarmadayaccount · 2011-03-15 00:01 · Score: 1

The second they try something smart - someone is gonna pull a HT interconnected chip and screw them over. Though the cache coherency protocol may be an issue. OTOH, 4x1Gbps backplane Ethernet, is standard, and wouldn't be too expensive to slap between chips, and let Xen in cluster mode handle the cache coherency. Beowulf SSI in a box. Sounds nice.

--
I know tobacco is bad for you, so I smoke weed with crack.

Cheaper way by eclectro · 2011-03-13 21:32 · Score: 1

Have a beowulf cluster of cell phones.

--
Take the cheese to sickbay, the doctor should see it as soon as possible - B'Elanna Torres, "Learning Curve"

Re:Cheaper way by Anonymous Coward · 2011-03-14 00:12 · Score: 1

the service contracts or ETF charges would cost way more.than the server would.
Re:Cheaper way by jDeepbeep · 2011-03-14 01:09 · Score: 4, Funny

Nah, too RISCy

--
Reply to That ||
Re:Cheaper way by binarylarry · 2011-03-14 01:30 · Score: 1

Don't be a CISCy

--
Mod me down, my New Earth Global Warmingist friends!
Re:Cheaper way by binarylarry · 2011-03-14 02:50 · Score: 1

Nice.

--
Mod me down, my New Earth Global Warmingist friends!

is it worth it? by metalmaster · 2011-03-13 21:36 · Score: 2

When you start piling all you can onto a chip the power consumption is going to naturally creep up. Once you reach a certain threshold of x chips you lose on the benefit of ARM being "low-power." Am i wrong?

Re:is it worth it? by swalve · 2011-03-13 21:43 · Score: 3, Insightful

Its low power in that the cores (I assume) can be shut down that aren't being used. Like a switchmode power supply versus a linear one. So you are always using the least amount of power possible.
Re:is it worth it? by L4t3r4lu5 · 2011-03-13 21:44 · Score: 5, Interesting

Cortex A9 is 250mW per core at 1GHz

You're looking at, for a 240 core 2U node, 60W for CPUs. Pretty impressive.

--
Finally had enough. Come see us over at https://soylentnews.org/
Re:is it worth it? by arivanov · 2011-03-13 21:53 · Score: 1

5W average, so let's assume up to 10W per CPU according to the article.
Not bad. In fact good enough to replace completely a commercial non-metered hosted VM offering of the kind memset (http://www.memset.co.uk/) offers at present.
The interesting question here is what is the interconnect between them. After all, who cares that you have 480 cores in 2U if 90% of the time they are twiddling their thumbs waiting for data to be delivered to them.

--
Baker's Law: Misery no longer loves company. Nowadays it insists on it
http://www.sigsegv.cx/
Re:is it worth it? by Bert64 · 2011-03-13 21:57 · Score: 1

That is the benefit of arm, the threshold for how many chips you can have is much higher because each individual chip uses less power.

--
http://spamdecoy.net - free throwaway anonymous email - avoid spam!
Re:is it worth it? by somersault · 2011-03-13 22:00 · Score: 1

A lot of servers are idling for most of the day, but you need them to be able to scale up quickly at certain peak times.

--
which is totally what she said
Re:is it worth it? by symbolset · 2011-03-13 22:03 · Score: 1

Yes, you are wrong.

--
Help stamp out iliturcy.
Re:is it worth it? by oranGoo · 2011-03-13 22:03 · Score: 1

If(!) ARM is more energy efficient then it delivers more processing power per unit of power. Principle works the same at 250mW and at 600W. It would also generate less heat. Ability to turn the cores on and off is additional benefit that would further improve efficiency.
Re:is it worth it? by Sulphur · 2011-03-13 22:19 · Score: 1

A lot of servers are idling for most of the day, but you need them to be able to scale up quickly at certain peak times.
Do you mean power up quickly?
Re:is it worth it? by fuzzyfuzzyfungus · 2011-03-13 22:30 · Score: 4, Interesting

It really depends on how much(and what kind of) support hardware ends up being involved in having lots and lots of them together in some useful way. That and what inefficiencies, if any, are present because your workload was really expecting a smaller number of higher-performance cores.

The power/performance of the core itself remains the same whether you have 1 or 1 million. The power demands of the memory may or may not change: phones and the like usually use a fairly small amount of low-power RAM in a package-on-package stack with the CPU. For server applications, something that takes DIMMS or SODIMMs might be more attractive, because PoP usually limits you in terms of quantity.

The big server-specific questions are going to be the nature of the "fabric" across which 120 nodes in a 2U are communicating. Because 120 ports worth of 10/100 or GigE would occupy 3Us and nonzero power themselves, I'm assuming that this fabric is either not ethernet at all, or some sort of cut-down "we don't need to care about the standards because the signal only has to travel 6 inches over boards we designed, with our hardware at both ends" pseudo-ethernet that looks like an ethernet connection for compatibility purposes; but is electrically more frugal. Whatever that costs, in terms of energy, will have to be added on to the effective energy cost of the CPUs themselves.

Then you get perhaps the most annoying variable: Many tasks are(either fundamentally, or because nobody bothered to program them to support it) basically dependent on access to a single very fast core, or to a modest number of cores with very fast access to one another's memory. For such applications, the performance of 400+ slow cores is going to be way worse than a naive addition of their individual powers would suggest. Sharing time on a fast core is both fundamentally easier, and enjoys a much longer history of development, than does dividing a task among small ones. With some workloads, that will make this box nearly useless(especially if the interconnect is slow and/or doesn't do memory access). For others, performance might be nearly as good as a naive prediction would suggest.
Re:is it worth it? by somersault · 2011-03-13 23:07 · Score: 3, Interesting

Not really, the server could stay powered up the whole time (unless you really get 0% usage at non-peak times, and those times are predictable, in which case it makes sense to just power down completely at those times). By scaling up I mean enabling more cores, thus improving the processing capacity of the server. Then you'd get the best of both worlds, with the server being fine for anything from small to massive workloads, while still using less power than the equivalent x86 setup. Like modern engines which can enable or disable cylinders at will to conserve fuel when not much power is needed.

--
which is totally what she said
Re:is it worth it? by TheRaven64 · 2011-03-13 23:24 · Score: 1

TFA said 5W per node, meaning per 4 cores + RAM. That's 600W for the entire system, which is fine for a 2U enclosure.
Aside from the interconnect, the other important question is how much RAM are they going to have? They're using the Cortex A9, not the A15, so they just have a 32-bit physical address space. In theory, this lets them have 4GB of RAM per node (1GB per core), but some of that needs to be used for memory-mapped I/O, so I'd be surprised if they got more than 3GB, maybe only 2GB. That would mean only 512MB per core, which is a little bit tight for a lot of workloads.

--
I am TheRaven on Soylent News
Re:is it worth it? by SlashV · 2011-03-14 00:25 · Score: 1

The analogy with a switchmode power supply is completely b0rked. It doesn't contain any cars. (furthermore, switching off cores in a multicore server is complete unlike the 'switching' in a switch mode power supply)
Re:is it worth it? by wvmarle · 2011-03-14 00:44 · Score: 2

Most servers do not do heavy computing work: they serve up (dynamic) web pages, handle SQL queries, process e-mail, serve files. That sounds to me like lots and lots of threads that each have relatively little work to do.
For example /.: the serving of a single page to a single visitor will take a few dozen SQL queries and the running of a Perl script to stitch it all together. This takes, say, 0.001 seconds of time of an x86 core - a wild guess, may be an order of magnitude off, good enough for the sake of the argument. An ARM core is maybe a tenth of that speed, so that single page would need 0.01 seconds of processing power to build up. And that is assuming the processor is the bottleneck. Likely the network to access the SQL servers is the bottleneck, which may end up the same overall time to build up that web page.
But now there are thousands upon thousands of visitors - all requesting pages. As this all goes parallel, it would simply require ten ARM cores to replace one x86 core and retain the same overall output.
Indeed when you're doing heavy scientific calculations - then ARM definitely won't stand a chance. But web pages won't even need you to do any floating point arithmetic. The same for handling an e-mail queue. It's I/O that's important, the capacity of moving the correct bits from A to B. And from what I've learned about these processors I don't think ARM is doing that so much worse than x86. So depending on the server load, there may really be something to it. Especially as those ten ARM cores use just a fraction of the power of a single x68 core.
Re:is it worth it? by poetmatt · 2011-03-14 01:59 · Score: 1

There are two arguments for hardware in enterprise. 1: Power to watts ratio. This is substantially more capable than just about anything out there for X86 right now, shy of supercomputers.
Re:is it worth it? by wagnerrp · 2011-03-14 03:20 · Score: 1

512MB per core really isn't bad at all, when you consider that core has about the same performance of a 10yr old Pentium 3.
Re:is it worth it? by npsimons · 2011-03-14 04:13 · Score: 1

It really depends on how much(and what kind of) support hardware ends up being involved in having lots and lots of them together in some useful way. That and what inefficiencies, if any, are present because your workload was really expecting a smaller number of higher-performance cores.
I've been saying for years that people should make their chunks of code smaller (eg, smaller functions, et al) so it's easier to understand and maintain. The old argument has always been that the compiler will inline it even if you don't tell it to. I think now, looking towards the future, it's obvious that parallelization will be what drives performance. Code that is already broken down into smaller chunks will scale better to a large number of cores. I guess what I'm trying to say is: break your code down, even beyond what you think is too much; the compiler can inline it for beefier, lower core CPUs, and given the proper backends, automatically thread it to lower power, massively cored architectures. Plus you get the not insignificant bonus of more maintainable code!

--
Nathan's blog
Re:is it worth it? by del_diablo · 2011-03-14 08:45 · Score: 1

Benchmark to back the claim up?
Besides, ARM do not suffer some of the insane x86 problems.
Re:is it worth it? by wagnerrp · 2011-03-14 09:38 · Score: 2

The comment wasn't intended to be derogatory against the ARM. The ARM was just designed from the ground up with low power consumption in mind, not performance. The Cortex A9 has an 8-stage pipeline, 2.5 instructions per clock, around 13M transistors per core, runs at 800MHz to 1.5GHz, and has up to 512KB of L2 cache. The Pentium 3 has a 10-stage pipeline, 2.5 instructions per clock, around 10M transistors, runs at 500MHz-1.4GHz, and has up to 512KB of L2 cache. They're fairly comparable processors, with the ARM probably having a better instruction dispatcher and branch predictor, and the P3 having better floating point performance.
While it doesn't have a lot of power comparable to modern x86 chips, it absolutely blows them away in performance per watt. It's a much better prospect for low power systems than the Atom, where Intel effectively tossed out 15 years of microprocessor design ripping out parts to cut power consumption.
Re:is it worth it? by Alex+Belits · 2011-03-14 10:30 · Score: 1

600W per 2U server is possible but very impractical -- a full rack will require 12kW (to power it and 12kW of cooling).
I also don't believe, they thought it through, how to stuff 120 processors and at least 120 DIMMs into 2U case and cool them efficiently -- one ARM CPU requires no forced-air cooling, and one DIMM can be cooled by whatever blows around for ther reasons, but 120 of them need airflow, and plenty of it. If they don't use separate DIMMs and have fixed RAM (I hope, it's ECC and enough to run a database server), they also have to deal with giant footprint and tricky layout.

--
Contrary to the popular belief, there indeed is no God.
Re:is it worth it? by TheRaven64 · 2011-03-14 11:23 · Score: 1

They almost certainly aren't using DIMMs. To get the power consumption that they talk about, they'll be using MobileDDR in a package-on-package (PoP) configuration. This means that the ARM SoC and the memory are cooled as a single unit.

--
I am TheRaven on Soylent News
Re:is it worth it? by TheRaven64 · 2011-03-14 11:36 · Score: 2

They're fairly comparable processors, with the ARM probably having a better instruction dispatcher and branch predictor, and the P3 having better floating point performance.
The ARM chip probably doesn't have a better branch predictor. The Pentium 4 had a very good one, which was back-ported to the Pentium-M. The Pentium 3 one was pretty good. ARM chips didn't have one at all until very recently, because branch prediction is much less important with the ARM ISA.
A lot of ARM instructions are predicated, meaning that they are evaluated, but their results are only retired if a specific condition register is set. Branch prediction on x86 is very important, because short if sequences cause a pipeline stall if they are not correctly predicted. For example, consider this made up example:

if (x % 2) { x++; }

With an x86 chip, this will be a conditional branch to skip over the increment. The Pentium 3 branch predictor should get this right most of the time, but if it gets it wrong then you have to flush all of the instructions that were put into the pipeline after the branch instruction (which can be quite a lot, but is probably around 10 in a typical case). In contrast, the ARM version will just use the predicated version of the increment instruction, so the worst that happens is that you lose one cycle.
For longer branches, the cost of a pipeline stall is less important relative to the overall cost of execution, but it's still quite important. Older ARM chips had very short pipelines, so it wasn't really worth bothering wasting power on a branch predictor. Newer ones do branch prediction, but you can turn off the predictor to save power.
Comparing ARM instructions per clock and x86 instructions per clock is pretty hard. x86 has some trivial instructions and some incredibly complex ones. ARM instruction density is often very good - it's about the only ISA that regularly beats x86. For example, ARM load instructions get a free barrel shift, which makes array indexing very fast - often a single instruction for accessing an array element.

--
I am TheRaven on Soylent News
Re:is it worth it? by Alex+Belits · 2011-03-14 13:21 · Score: 1

Not with the density of power they are trying to achieve -- it will mean that 10-20% of all power dissipation will happen on chips with nearly perfect thermal insulation around them (board, layer of air and another chip). It will be probably the first device ever to overheat ARM with the heat it produced. Even if air will be eliminated, RAM chips are not good at conducting heat from bottom to top.
It will make sense to place RAM on the opposite side of the board, and have airflow on both sides, but again, it's 600W. Imagine a board with Intel CPU plus 50 DIMMs (that will be 50-100G of RAM) in 2U -- that would be a similar kind of challenge to power and cool.

--
Contrary to the popular belief, there indeed is no God.

WANTED: 1U low-power rack server by inflex · 2011-03-13 22:09 · Score: 1

Right now I'm running an Intel D510 rack server with dual 2.5" drives, it's great, does a lovely job even with it running Ubuntu 10.04 server + VirtualBox ( Ubuntu 8.04 LTS ), however, I'd dearly love to shift over to something even more low-power/compact/SOC, so long as it has SATA, Ethernet, USB and runs a debian-based distro I'd be happy.

Something like a dual-core ARM machine would run ample for the server loads I'm seeing.

So, anyone seen anything like that yet? Or even just a MB in Mini-ITX ?

(btw, why is it that Intel HT enabled still seems to cause random hangs... or maybe it's just coincidental).

Re:WANTED: 1U low-power rack server by Anne+Thwacks · 2011-03-13 22:24 · Score: 1

I want one too (probably three). But I want to run OpenBSD on mine.

--
Sent from my ASR33 using ASCII
Re:WANTED: 1U low-power rack server by TheRaven64 · 2011-03-13 23:28 · Score: 2

Take a look at the PandaBoard, if you want a low-power, dual-core ARM server, although you'd have to use CF + USB for storage, not SATA. Note, however, that VirtualBox is x86-only. If you want virtualisation, you're currently pretty limited on ARM. There is a Xen port, but it's not really packaged for end users yet.

--
I am TheRaven on Soylent News
Re:WANTED: 1U low-power rack server by espiesp · 2011-03-13 23:29 · Score: 2

While not in 1U format or a lot of off the shelf NAS boxes use ARM. My LG N2R1 NAS has a 800MHz Marvell 88F6192 and runs Lenny. I won't be surprised to see some NanoITX boards out running similar hardware. Plus, I've been very impressed with how many Debian packages are available for ARMEL. While not perfect, it's the most useful Linux server I've ever had.
Re:WANTED: 1U low-power rack server by vlm · 2011-03-13 23:33 · Score: 1

Hows the dual drive support on the sheeva plug? Looks like the pogo also uses usb as its "drive interface"
Something like a soekris board / case than handles two SATA drives in a RAID mirror would be nice.
The best bet for the original poster is to ask the mythtv guys for low power / fanless options, and stuff it all into a 1U case (assuming rackmount is mandatory)

--
"Science flies us to the moon. Religion flies us into buildings." - Victor Stenger
Re:WANTED: 1U low-power rack server by inflex · 2011-03-13 23:39 · Score: 1

That's a good point about the NAS systems, they're comparatively cheap too!
Re:WANTED: 1U low-power rack server by SuricouRaven · 2011-03-14 00:25 · Score: 1

Pogoplugs are toasty. They've been plagued by overheating issues.
Re:WANTED: 1U low-power rack server by Nursie · 2011-03-14 00:39 · Score: 2

You need to watch out with them also though. The WD Sharespace I have uses a 500MHz chip which is totally inadequate for decent throughput between the 4-disk array and the GigE interface.
And I had to write my own device support into the kernel to get it running a modern OS! It came with 2.6.12!
Re:WANTED: 1U low-power rack server by inflex · 2011-03-14 01:12 · Score: 1

Thanks - I've seen some Netgear MS-2000 ones on sale recently for about $130 AUD. and then the RND-2000 for $250.
Meh, maybe I'll just wait for AMD to bring out their "low power" options in Mini-ITX :sigh:
Re:WANTED: 1U low-power rack server by fnj · 2011-03-14 01:15 · Score: 1

Why does the spec page omit the single most important spec: power consumption?
Re:WANTED: 1U low-power rack server by inflex · 2011-03-14 01:27 · Score: 1

A shame, even with 50% off on some, they're as expensive as something like a FitPC2 :(
I'm hoping at some point we can see a $99 personal server option, maybe cram 4~6 into a 1U rack.
Re:WANTED: 1U low-power rack server by StuartHankins · 2011-03-14 02:10 · Score: 1

I bought an RND-2000 and 2 fairly slow 2TB drives (5900 rpm for less noise) since it was to be installed in my bedroom. I got the whole thing shipped with 2 drives for around $430

Software-wise it's fairly nice, with support for Time Machine, AFP, CIFS etc and works great for any single task. But ask it to do more than 1 task and it just doesn't have the horsepower -- for instance copying a large file and trying to play a song causes the song playback to be delayed. If you're using an iPad to stream music or video that also works fine -- unless there's a Time Machine backup going. Then you are delayed; you can't even navigate to different folders from the iDevice. The RISC chip used in the RND-2000 is just soooo slow. Although I can ssh to it (a big plus when the AFP goes nuts and I can no longer delete folders with strange names) and even use rsync on it, it's substantially faster to mount the drive and run rsync from my Mac... this thing is really CPU-bound.

The good news is that while it's copying a file, it gets around 2GB/minute with journaling disabled, jumbo frames turned on, over a GbE network which is pretty good. I know the next model up is around $1000 but I would probably go with the upgrade unless it's truly something you want to use as a single person and don't need simultaneous stuff going on.
Re:WANTED: 1U low-power rack server by wagnerrp · 2011-03-14 03:39 · Score: 1

The MythTV guys have completely different needs than an underutilized server operator. We have to deal with a very complex scheduler, which if it takes too long to run can cause problems, and with HD video that typically can only be decoded single threaded. Single threaded performance, and a lot of it, is a must, meaning our minimum recommendation is 2.5GHz Core 2 or Athlon II, or better.
That's not to say you can't be low power while you're at it. Tom's Hardware did an article last year where with not considerable effort, they put together a 3.33GHz dual core i5 that idled under 25W. Even better, one of the Mac Mini XServes would idle at less power than your existing Atom. It's always nice to have the headroom available should you want it in the future, and at 25W, it's only going to consume maybe $50 more electricity over a 5yr life than that Atom system.
Re:WANTED: 1U low-power rack server by Nursie · 2011-03-14 04:32 · Score: 1

Wow, that is *awesome* compared to the max transfer of around 24MB (bytes at least, not bits) I get out of the sharespace.
That's over vanilla ftp and the processor is max'd at that point. Not the drives or the network interface, the processor. Dammit so much...
Re:WANTED: 1U low-power rack server by aztektum · 2011-03-14 05:04 · Score: 1

Good luck getting one of those in your hands. My coworker right across the aisle ordered one in January. Still not sure when it will ship.

--
:: aztek ::
No sig for you!!
Re:WANTED: 1U low-power rack server by fatphil · 2011-03-14 10:21 · Score: 1

Do any of the later offerings that followed ShivaPlug, such as GuruPlug, do what you want?

--
Also FatPhil on SoylentNews, id 863
Re:WANTED: 1U low-power rack server by fnj · 2011-03-14 16:21 · Score: 1

Hence the question, why would they omit that most salient fact?
Re:WANTED: 1U low-power rack server by jarlsberg71 · 2011-03-15 07:06 · Score: 1

I've had miserable performance with mine, Start moving data to it and the interface comes back with "Too Busy!" for 2 weeks. Then it slowed down and needed to be rebooted.

--
E8B8B
Re:WANTED: 1U low-power rack server by Nursie · 2011-03-15 16:23 · Score: 1

It's pretty damned poor, yup. I figured the onboard software was probabl crap so I hacked mine to hell:
Managed to find the onboard serial pins and solder on a line-levelling serial adaptor, downloaded the WD GPL source, translated the needed Orion/Marvell code tree settings to modern/mainline kernel initialisation code, built a whole bunch of custom kernels, figured out the internal flash layout and how to create u-boot kernel images and initiramfs images and eventually got it to boot debian squeeze.
And it still sucks!
They basically just totally underpowered the machine on the processor front.

Re:And it's useless. No 64-bit support. by jabjoe · 2011-03-13 22:25 · Score: 1

Do many websites need a 64bit memory range? I don't think so. Big database servers and the like, yes, but I doubt many website servers.

Re:So... by Anonymous Coward · 2011-03-13 22:30 · Score: 1

I think you would have more luck over at ExpertSexchange.

Try titling your post 'Urgent: I password-protected my 1TB porn collection and I forgot my p/w'.

Re:And it's useless. No 64-bit support. by Cyberax · 2011-03-13 22:39 · Score: 1

Yes, they do. First, if you're hosting a single web-site on a single server then you'll probably want to install more than 4Gb just because RAM is so cheap now. And you'll inevitably use it (for databases, file cache, etc.). If you're hosting multiple sites on a single server, then you DEFINITELY need more than 4Gb of RAM per server (as it's going to be the limiting component).

Maybe ARM is justified for large Google-style server farms doing specialized work which does not require great amounts of RAM.

160 more by Hognoxious · 2011-03-13 22:49 · Score: 1, Funny

Another 160 and that should be enough for anybody!

--
Confucius say, "Find worm in apple - bad. Find half a worm - worse."

Re:160 more by Falconhell · 2011-03-13 23:57 · Score: 1

Damnit, second last post currently and you beat me to that joket!

Re:And it's useless. No 64-bit support. by GeLeTo · 2011-03-13 22:49 · Score: 2

ARM's Large Physical Address Extensions (LPAE) allows access to up to 1TB of memory. While I doubt applications will use this, it will allow each virtualized host on the server to use 4GB of memory.

Re:And it's useless. No 64-bit support. by JackDW · 2011-03-13 23:29 · Score: 1

It couldn't be an SMP machine though, not with so many cores.

My bet would be that each of the 120 nodes actually is a complete computer with 4 cores and its own memory - linked to the other 119 only via Ethernet. In this arrangement the 32-bit memory limit is not such a big issue. Each individual machine will not be particularly powerful anyway.

--
You're an immobile computer, remember?

Re:And it's useless. No 64-bit support. by SuricouRaven · 2011-03-13 23:34 · Score: 1

Even programs that you wouldn't expect to need much memory often benefit heavily, as any modern desktop or server OS uses free RAM for disk cacheing. Adding more memory means fewer slow, slow disk reads are needed.

Re:And it's useless. No 64-bit support. by TheRaven64 · 2011-03-13 23:41 · Score: 4, Informative

How about a link to this rant, if you want us to read it? And, if you've got a problem with PAE-like extensions, then I presume you're aware that both Intel's and AMD's virtualisation extensions use PAE-like addressing?

All that PAE and LPAE do is decouple the size of the physical and virtual address spaces. This is a fairly trivial extension to existing virtual memory schemes. On any modern system, there is some mechanism for mapping from virtual to physical pages, so each application sees a 4GB private address space (on a 32-bit system) and the pages that it uses are mapped to some from physical memory. With PAE / LPAE, the only difference is that this mapping now lets you map to a larger physical address space - for example, 32-bit virtual to 36-bit physical. You see exactly the opposite of this on almost all 64-bit platforms, where you have a 64-bit virtual address space but only a 40- or 48-bit physical address space.

The big problem with PAE was that most machines that supported it came with 32-bit peripherals and no IOMMU. This meant that the peripherals could do DMA transfers to and from the low 4GB, but not anywhere else in memory. This dramatically complicated the work that the kernel had to do, because it needed to either remap memory pages from the low 4GB and copy their contents or use bounce buffers, neither of which was good for performance (which, generally, is something that people who need more than 4GB of RAM care about).

The advantage is that you can add more physical memory without changing the ABI. Pointers remain 32 bits, and applications are each limited to 4GB of virtual address space, but you can have multiple applications all using 4GB without needing to swap. Oh, and you also get better cache usage than with a pure 64-bit ABI, because you're not using 8 bytes to store a pointer into an address space that's much smaller than 4GB.

By the way, I just did a quick check on a few 64-bit machines that I have accounts on. Out of about 700 processes running on these systems (one laptop, two servers, one compute node), none were using more than 4GB of virtual address space.

--
I am TheRaven on Soylent News

Re:And it's useless. No 64-bit support. by pmontra · 2011-03-14 00:00 · Score: 2

How about a link to this rant

http://blog.linuxolution.org/archives/117

Re:And it's useless. No 64-bit support. by Anonymous Coward · 2011-03-14 00:00 · Score: 1

Utter bollocks. I work for a data centre, and there is no way 4GB is *required* for multiple sites or anything like that. How about one server, running 20-odd Linux Jails, each with between 20-32 sites, all in 2GB.

the real question by Anonymous Coward · 2011-03-14 00:24 · Score: 1

The real question is, can anyone afford to install an oracle database on that server?

Re:And it's useless. No 64-bit support. by GeLeTo · 2011-03-14 00:24 · Score: 1

Linus' rant is about using PAE in a desktop enviroment, which I agree with (that's why I said that I doubt any applications will use PAE). It says nothing about virtualisation. LPAE will work just fine for VMs.

Re:GET SOME PRIORITIES!!! by Shikaku · 2011-03-14 00:35 · Score: 1

And you're posting on Slashdot, instead of flying your private jet to Japan to personally pick up debris and rescue people.

Oh right, only rich people have private jets, a lot planes won't fly to Japan now, and even if you get a flight, unless you are currently in Japan with a car (most public transportation is down where help would be needed, and most Japanese people don't own cars), you'd have to walk to the disaster areas. You can't do anything except donate money and hope.

Grow up and learn that shit happens, and that your sheltered life can be destroyed in an instant, with little other people can do to help.

Re:And it's useless. No 64-bit support. by Bengie · 2011-03-14 00:36 · Score: 2

64bit memory range? Each node is going to have it's own memory slot(s). 120 cores, 4 cores per node = 30 nodes. If you plan to have less than 4GB of memory in this system, how small does each stick have to be when you plug 30 in? ~128mb. Good Luck finding a bunch of DDR2/3 128MB sticks to plug into your 4GB 120 core web server. Anyway, each node needs its own local copy of the data it needs to serve up. If you web page needs ~256MB, each node is going to need the same 256MB of data duplicated, plus any extra overhead. You can't expect all 30 nodes to access the same 2-3 memory slots; that would scale like crap. This is one of the issues you get when scaling via cores. Interconnection bandwidth/latency becomes an issue and you need to use local storage to allow fully independent processing. Once you start getting up into these ranges, you're better off thinking of each node as its own computer with a fairly high speed network.

Re:And it's useless. No 64-bit support. by wvmarle · 2011-03-14 00:48 · Score: 1

Instead of virtualising ten servers on a single physical box, you could of course consider running a single server on a single piece of hardware again. And still win power/flexibility wise if you can get your "low-power" ARM board to cost much less than your souped up x86 board. If only because if a single board fails, just one server goes down. Not all ten.

Re:GET SOME PRIORITIES!!! by .tekrox · 2011-03-14 00:59 · Score: 1

So basically you want Slashdot to turn into every news outlet on earth right now?
If I want to hear more about any of the current natural disasters, the state of Libya or even what lipgloss Jooolia is wearing this week - I'll turn on the Television or read a news-corporation owned website.

This is Slashdot, News for Nerds - just because a disaster happened doesn't mean we stop wanting to know about anything else.

Jeez.

Comment removed by account_deleted · 2011-03-14 01:11 · Score: 1

Comment removed based on user account deletion

Re:And it's useless. No 64-bit support. by TheRaven64 · 2011-03-14 01:37 · Score: 2

His complaint basically boils down to the fact that the kernel needs to be able to map all of physical memory, and have some address space left over for memory-mapped I/O. This is a valid complaint for a kernel developer (although Linus' 'everyone who disagrees with me is an idiot' style is quite irritating), but it largely irrelevant to the issue at hand. There is nothing stopping a kernel on ARM with LPAE from using 64-bit pointers internally. You still need to translate userspace pointers, but you need to do that anyway on most architectures (on x86, context switches are insanely expensive, so typically you use a segment for the kernel and run system call handlers without changing the page tables, just making the kernel segment visible by switching to ring 0), so that code already exists in all of the relevant places in the kernel.

--
I am TheRaven on Soylent News

Re:Sounds like my next workstation by Jurily · 2011-03-14 01:51 · Score: 1, Funny

Right now my system doesn't even have 480 live processes on it, let alone ones contending for execution time.

You're obviously not running Gentoo.

Re:And it's useless. No 64-bit support. by Cyberax · 2011-03-14 01:56 · Score: 1

No, the problem is:
1) Kernel is starved for _address_ _space_ for its internal structures.
2) Userspace is starved for address space, because it has to view all the RAM through a small aperture (think EMS in 80286).
3) Constant address space remapping is costly.

And it doesn't matter that you use 64-bit pointers internally, because you can't address data directly.

Language-imposed gratuitous use of floating point by tepples · 2011-03-14 02:03 · Score: 1

But web pages won't even need you to do any floating point arithmetic.

Provided your application is written in a language that supports not-floating-point arithmetic. In PHP, for example, any division returns a floating-point result, as does any computation with numbers over 2 billion (such as the UNIX timestamps of dates past 2038).

Re:And it's useless. No 64-bit support. by tepples · 2011-03-14 02:21 · Score: 1

On a database server, if it's highly used, is largely stuck on the slowest part (disk i/o) when it has to do full table scans. You solve this by building proper indexes

Until you have to use a DBMS that ignores your indexes. For example, MySQL appears unable to make efficient use of an index on a subquery that uses GROUP BY. From the manual: "A subquery in the FROM clause is evaluated by materializing the result into a temporary table, and this table does not use indexes. This does not allow the use of indexes in comparison with other tables in the query, although that might be useful." The only reason I haven't already rewritten it as a join is that the subquery uses GROUP BY. The workaround I have adopted is to rewrite the query as multiple CREATE TEMPORARY TABLE ... SELECT statements so that as few rows at possible are seen at once. Or is there a better workaround, other than dropping MySQL entirely?

Re:And it's useless. No 64-bit support. by Theovon · 2011-03-14 02:30 · Score: 1

I do scientific computing where we regularly use virtual address spaces larger than 4GB. Not all of that is in the working set, of course, but it's often necessary to have that much mapped. One recent example is my leakage power and delay models for near-threshold circuits. I implemented the Markovic forumlas and found them to be too slow. My simulations would take days. So, I figured out the granularities I needed for voltage, power, and temperature, and I implemented those models as giant look-up tables. The leakage power model occupies 4GB of address space all by itself. I just mmap the file into the process and go. Now the simulations take only hours.

leave britney alone! by luis_a_espinal · 2011-03-14 02:41 · Score: 1

The worst natural disaster in recorded history occurred less than a week ago, and you people are discussing Calxeda's first ARM-based server chip, designed to let companies build low-power servers with up to 480 cores; as the chip is built on a quad-core ARM processor, and low-power servers could have 120 ARM processing nodes in a 2U box; chips will be based on ARM's Cortex-A9 processor architecture???? My *god*, people, GET SOME PRIORITIES!

The bodies of nearly 10,000 dead people could give a good god damn about the advent of LAN parties, your childish Lego models, your nerf toys and lack of a "fun" workplace, your Everquest/Diablo/D&D addiction, or any of the other ways you are "getting on with your life".

I have inlaws and friends in Japan, and thank God they are all fine. But even if something have had happened to them, what would you expect me, a /. reader, or anyone, to do? To cut my veins and pour ash on my head? What about the rest of the readers. You are just an attention whore looking for a cause celebre to be upset about. Nothing more as your little rant does nothing constructive.

You don't know if people reading this donated for the cause. You do not know anything about anyone here, about what they do or feel, and yet you act as if you would.

There is a difference between mourning and empathy, and shameless and useless "leave britney alone" attention whoring. Guess which one describes you buddy.

Re:And it's useless. No 64-bit support. by TheRaven64 · 2011-03-14 03:00 · Score: 1

1) Kernel is starved for _address_ _space_ for its internal structures.

This is addressed by using physical addresses in the kernel, as I said. It can use 64-bit pointers, and the compiler emits direct loads and stores that bypass the MMU.

Userspace is starved for address space, because it has to view all the RAM through a small aperture (think EMS in 80286).

Which is only relevant if the process actually wants more than 4GB of address space, i.e. not very often (yet).

Constant address space remapping is costly

True, but this is only required on x86 because the kernel is using its own virtual address space. This is not an issue on ARM.

--
I am TheRaven on Soylent News

Re:And it's useless. No 64-bit support. by TheRaven64 · 2011-03-14 03:02 · Score: 1

If you are doing scientific computing, then you are not in the target market for a system like this. The virtual address space size is the least of your problems - the relatively anaemic floating point performance is going to cripple your performance.

--
I am TheRaven on Soylent News

Re:And it's useless. No 64-bit support. by MarkRose · 2011-03-14 04:45 · Score: 1

A proper webserver only needs 1 thread per core. Each socket/connection should only consume a few KB of RAM at most. A webserver shouldn't use more than a couple dozen MB of RAM at most, not including the OS file system cache. Look into Nginx or Lighttp.

--
Be relentless!

Re:And it's useless. No 64-bit support. by the+linux+geek · 2011-03-14 05:19 · Score: 2

This kind of arrangement gets brought up over and over - one of the more recent examples is SiCortex, and it sucked. Having a Single System Image is always preferable to a "cluster in a box."

iPad 20...now with 480 cores by schlachter · 2011-03-14 06:11 · Score: 1

Now with 480 cores....2x as fast and with 9x better graphics than the iPad 19.

--
My God can beat up your God. Just kidding...don't take offense. I know there's no God.

Re:And it's useless. No 64-bit support. by Fulcrum+of+Evil · 2011-03-14 07:36 · Score: 1

You're more the target market for a nice G34 AMD system - 24 cores in 2 sockets, 64G of ram. This is more about serving lots of php.

--
"We returned the General to El Salvador, or maybe Guatemala, it's difficult to tell from 10,000 feet"

Re:And it's useless. No 64-bit support. by Bengie · 2011-03-14 12:11 · Score: 1

Nice to know :-) If it works as a unified memory, then 2GB per node and 30 nodes is going to be way more than 32bit addressing, but it would be great for distributed work. If each Node runs as it's own machine, then they will have to have a separate boot drive for each node and each node will have to have some sort of network connection to every other node. Should be interesting once more info comes out.

Re:Sounds like my next workstation by badkarmadayaccount · 2011-03-15 00:02 · Score: 1

Raytracing.

--
I know tobacco is bad for you, so I smoke weed with crack.

Re:And it's useless. No 64-bit support. by badkarmadayaccount · 2011-03-15 03:24 · Score: 1

I can't imagine a better workaround than dropping MySQL.

--
I know tobacco is bad for you, so I smoke weed with crack.

Drop MySQL in favor of what? by tepples · 2011-03-15 03:31 · Score: 1

I can't imagine a better workaround than dropping MySQL.

In favor of what? PostgreSQL, or something one has to pay for? Either way, dropping MySQL support in the next version would require a lot of clients to drop their current hosting provider and switch from (cheap) shared hosting to a (more expensive) VPS.

Re:Drop MySQL in favor of what? by badkarmadayaccount · 2011-03-15 04:02 · Score: 1

If only more people ditched MySQL, nobody would bother offering it, and would set some reasonable prices on VPS hosting. Oh, wait... http://www.postgresql.org/support/professional_hosting_northamerica

--
I know tobacco is bad for you, so I smoke weed with crack.

Re:And it's useless. No 64-bit support. by badkarmadayaccount · 2011-03-15 03:53 · Score: 1

AFAIK, most OSes shut down the MMU in kernel mode - linux for instance. Address space remaps are costly because of a lot of explicit, non-cached memory accesses. Though I don't see why some more PAE bits can't replace 64-bit mode - you just need an IOMMU. And possibly hardware virtualization with a simple hypervizor. Though that might actually be faster, considering all the savings you make from pointers, not to mention that if the MMU and wide load/store instructions trap to the hypervizor directly - the context switch cost is the same as calling the OS.

--
I know tobacco is bad for you, so I smoke weed with crack.

Re:And it's useless. No 64-bit support. by badkarmadayaccount · 2011-03-15 03:57 · Score: 1

SSI can be done in the system firmware/hypervizor/kernel. Linux supports it.

--
I know tobacco is bad for you, so I smoke weed with crack.

Re:And it's useless. No 64-bit support. by TheRaven64 · 2011-03-15 05:17 · Score: 1

AFAIK, most OSes shut down the MMU in kernel mode - linux for instance

Linux certainly doesn't do this on x86. It uses the segmentation mechanism. The kernel's memory is in a segment, marked as only visible to ring 0 code. When you make a system call, the current process's segment(s) remain visible to the OS, as does the kernel's segment. This means that you typically have 1GB of address space reserved for the userspace process, and 3GB for each userspace process. RedHat used to ship a kernel that used an entirely separate address space, so you got 4GB for the kernel and 4GB for each userspace app, but this required a TLB flush on each system call (in and out) so it was quite slow.

The problems with PAE are not really problems with PAE, so much as they are problems with the completely hatstand memory architecture of x86.

--
I am TheRaven on Soylent News

Re:And it's useless. No 64-bit support. by badkarmadayaccount · 2011-03-15 09:05 · Score: 1

Paging is shutdown. The MMU does paging. The segmentation mechanism is separate.

--
I know tobacco is bad for you, so I smoke weed with crack.

Re:And it's useless. No 64-bit support. by TheRaven64 · 2011-03-15 11:23 · Score: 1

There are so many things wrong with that, that I don't even know where to start. The MMU on x86 handles both paging and segmentation. Segments map from virtual addresses to linear addresses. Paging maps from linear addresses to physical addresses. Both are part of the virtual memory mapping handled by the MMU, which first walks the LDT / GDT, then the page tables, to translate from a virtual address to the physical.

It sounds like you're repeating something that you heard and didn't understand. What you probably heard was that kernel memory can't be swapped - it is always resident in RAM, not paged (swapped) out to disk. This used to be true for Linux, but hasn't been for a while - recent kernels (as in, ones from about the last ten years) can swap some kernel memory out - but not all of it (for example, swapping out the VM subsystem would be a really bad idea, since you wouldn't be able to swap it back in. Swapping out interrupt handlers would also break things).

--
I am TheRaven on Soylent News

Re:And it's useless. No 64-bit support. by badkarmadayaccount · 2011-03-17 01:23 · Score: 1

You got me - though I've never actually heard of the MMU being used in the kernel.

--
I know tobacco is bad for you, so I smoke weed with crack.

Slashdot Mirror

ARM Chips Designed For 480-Core Servers

98 of 132 comments (clear)