Apple Wins VT in Cost. vs. Performance

← Back to Stories (view on slashdot.org)

Apple Wins VT in Cost. vs. Performance

Posted by pudge on Monday September 8, 2003 @05:27AM from the i-knew-it-all-along dept.

danigiri writes "Detailed notes about a presentation at Virginia Tech are posted by by an attending student. copied most of the slides of the facts presentation and wrote down their comments. He wrote some insightful notes and info snippets, like the fact that Apple gave the cheapest deal of machines with chassis, beating Dell, IBM, HP. They are definitely going to use some in-house fault-tolerance software to prevent the odd memory-bit error on such a bunch of non-error-tolerant RAM and any other hard or soft glitches. The G5 cluster will be accepting first apps around-November." mfago adds, "Apple beat Dell, IBM and others based on Cost vs. Performance alone, and it will run Mac OS X because 'there is not enough support for Linux.'"

20 of 105 comments (clear)

Min score:

Reason:

Sort:

Interesting by Isbiten · 2003-09-08 05:41 · Score: 3, Interesting

Dell - too expensive [one of the reasons for the project being so "hush hush" was that dell was exploring pricing options during bidding]

Who could have guessed? ;)

--
I fought the corporate America, and the corporate America bought the law.
1. Re:Interesting by Killigan · 2003-09-09 09:40 · Score: 2, Interesting
  
  I've heard (from a fairly reliable source) that the project was actually postponed about 5 months or so while Dell worked on lowering it's prices for VT, finally Dell gave it's lowest possible price, which still wasn't good enough for VT, so they did indeed give Dell plenty of time to try and beat Apple.
software to solve memory problems? by Anonymous Coward · 2003-09-08 05:49 · Score: 1, Interesting

How does that work?? How does software even KNOW if there was a glitch? Can I get this on my non-ECC Linux box????

IMHO the lack of ECC RAM is the only flaw in an otherwise perfect machine (well that, and the massive HEAT).
Infiniband insured latency? by dhall · 2003-09-08 06:32 · Score: 5, Interesting

One of the primary concerns for a multi-node cluster is insured latency among all components within the cluster. It doesn't have to be the fastest, it just needs to insured exacting timing for latency across all nodes. IBM can do this with their "wormhole" switch routing on SP and has done this with Myranet on their Intel X-series clusters.

From most of my reading with Infiniband, it was designed from the ground up as a NAS style solution, than for large multi-node cluster computing. I'm curious as to if they have any issues with cluster latency.

http://www.nwfusion.com/news/2002/1211sandia.htm l

The primary timings and white papers I've seen published for Infiniband have been for small clustered filesystem access. Although it's burst rate is much higher than Myranet, it's hard to find any raw retails for their multiple node latency normalization.

I hope it scales, since Intel's solution appears to be less cost prohibitive than some of the other solutions offered on the market, and would really open up the market even for smaller clusters (16-36 node) for business use.
For those in the know by gnuadam · 2003-09-08 06:35 · Score: 3, Interesting

I wonder if by "lack of support in linux," that they're refering to the fact that the fans are controlled by the operating system in the powermac? Or the fact that there are relatively few support companies for ppc linux?

Any insiders care to comment?

--
You say :wq, I say ZZ. Why can't we all just get along?
neat. by pb · 2003-09-08 06:37 · Score: 3, Interesting

Looks like the costs come out to $23,636 per node, or $4727 per machine. According to the Apple Store, an equivalently specced machine (dual proc G5, 160GB HD, 1GB RAM) comes out to just a little over $3,000. I suppose you might want a display on the management machine in each node, but that won't raise the price that much (say, $3,200 per machine instead). So that leaves ~$1,500 per machine for the networking hardware and whatever other expenses.

--
pb Reply or e-mail; don't vaguely moderate.
An interesting tidbit by BortQ · 2003-09-08 06:47 · Score: 4, Interesting

The very last slide states that
Current facility will be followed with a second in 2006
It will be very interesting to see if they also use macs for any followup cluster. If it works out well this could be the start of a macintosh push into clustered supercomputers.

--

A Multiplayer Strategy Game for Mac OS X, Windows, and Linux
graphics in science by trillian42 · 2003-09-08 06:59 · Score: 3, Interesting

I am a scientist, and lots of money gets put into transforming the tons of numbers that supercomputers produce into images that make sense to the human brain.

The system doesn't have to be chaotic, just complex:

Watching protein folding simulations.
Watching full 3-D seismic waves propagate through the Earth.
Watching, in general, any kind of 3-D model or simulation of a complex process evolving over time.

A couple links:

The Scripps Institute of Oceanography Visualization Center:
http://siovizcenter.ucsd.edu/library/objects/
The Arctic Region Supercomputing Center:
http://www.arsc.edu/news/mdflex.html
water cooled laptops as blades by goombah99 · 2003-09-08 07:14 · Score: 3, Interesting

The viriginia folks must have one huge room with some massive air handlers to circulate the air that will be trapped behind the towering walls of 1000 4U boxes.
A few years ago I asked apple if they would be willing to sell me 200 laptops without the screens, disks, video cards, and keyboards. They were interested helping me build my cluster but, the the engineers said it would actually cost them more to have a special manufacturing run than woul dbe saved by deleting the hardware.
my plan was stack these things on water cooled chill plates. Basically this would be like a blade.
In my circumstances, adding a well ventilated computer room to the building I was in would have been probibitively expensive. but water cooling and a high density configuration made this very appealing. And if I could have gotten the costs down and reliability up by deleting the screens, keyboard, video, and disks I'd have an affordable system with low sys-admin costs.
I still think its a good idea. Cooling/power costs (including building retrofits) and sys admin costs can dominate the differential purchase price of vairous cluster configurations. In my building the space alone was >120/sq foot, so even the footprint mattered.

--
Some drink at the fountain of knowledge. Others just gargle.
Nice rack! by Alex+Reynolds · 2003-09-08 07:33 · Score: 2, Interesting

If they do not fit into a standard rack enclosure, I would be curious to learn what customization was required to rack the G5s.

(Especially seeing as a G5 XServe will probably be at least several months away -- at least until most of the desktop orders can be filled.)

-Alex
Why was bidding secret? by mTor · 2003-09-08 07:42 · Score: 2, Interesting

Could someone please shed some light on this:
Why so secret? Project started back in February; secret with Dell because of the pricing issues; dealt with vendors individually because bidding wars do not drive the prices down in this case.
Why exactly is that? Is there a collusion between the vendors since there's so few of them? Does anyone have any experience with this sector?
1. Re:Why was bidding secret? by mTor · 2003-09-08 09:27 · Score: 2, Interesting
  
  I was actually referring to the last sentence:
  
  "dealt with vendors individually because bidding wars do not drive the prices down in this case."
  
  I don't think they've even dealt with Apple until Apple's G5 announcement but they did deal with other vendors. I'm interested why VU dealt with all of them individually and why do prices not come down when you deal with them in this way. This is why I was alluding to collusion.
Re:Clueless Sysadmins... by confused+one · 2003-09-08 08:34 · Score: 2, Interesting

Actually, you're not totally right. You can spread the rendering job across multiple radeon chips, each handling only a portion of the display. The performance and depth of the rendering could be greatly enhanced. SGI does something like this...
Re:Clueless Sysadmins... by WasterDave · 2003-09-08 12:07 · Score: 3, Interesting

Not all renders are real time, not all renders are onto a screen.

Now that "consumer" graphics cards run in floating point and have comparitively complex shader engines, it's quite possible to start working on rendering movies etc. with the substantial quantity of hardware acceleration possible on these things. You don't have to hit 60fps, and you can have as many passes as you like.

Mind you, with 1100 nodes if you can render a frame in 45 seconds .... on a twin G5 with a Radeon 9800 ... then you can render 24fps in real time. Real time lord of the rings, anyone?

Dave

--
I write a blog now, you should be afraid.
Re:Gentoo Linux Runs On The G5 ... from Mac/ by Unregistered · 2003-09-08 12:15 · Score: 2, Interesting

the crazy speeds they are claiming on the G5 running Gentoo Linux

Why hell, i get blazing speeds with gentoo on my Athlon, i'd sure hope that you'd get them on the g5 as well :).
Re:ECC FUD by Anonymous Coward · 2003-09-08 12:22 · Score: 1, Interesting

The probability of it being in an OS or application critical (especially given the converging nature of many long running calculations) piece of RAM as opposed to an empty piece of RAM is small.
Errr, what is the point of putting 4+ GB into your cluster nodes if you're not going to use it? This isn't a SETI@home cluster. Seems to me that "long running converging apps" tend to have large datasets associated with them. The higher the data density per node the less network bandwidth needed except for "embarassingly parallel" computations with essentially no comm overhead anyway.
I'll concur on desktops, but they typically don't have apps with large datasets that run for large periods of times. The Applications/data come and go and the screwed bits get wiped.
P.S. the cooling systems can fail also. It is also a question of redundancy. You'll find ECC in "big iron" systems also with significant cooling resources assigned to them. In a fail-over transition time the ECC would be leveraged until that node could be completely transitioned out.
Apple Outshines Dell on Ethics by reporter · 2003-09-08 14:21 · Score: 4, Interesting

Even if Apple computers were to cost slightly more than Dell computers, we should consistly buy the former instead of the latter. Price is only 1 aspect of any product. There are also ethical considerations. They do not matter much outside of Western society, but they matter a great deal in Western society.
As an American company, Dell is a huge disgrace. Please read the "Environmental Report Card" produced by the Silicon Valley Toxics Coalition. Dell received a failing grade and is little better than Taiwanese companies, which are notorious for destroying the environment and the health of workers. Dell even resorted to prison labor to implement its pathetic recycling program.
... from the desk of the reporter
Re:ECC FUD by Anonymous Coward · 2003-09-08 17:22 · Score: 4, Interesting

2. Even if somehow a none-thermal bit error occurs, each node has 4GB RAM. The probability of it being in an OS or application critical (especially given the converging nature of many long running calculations) piece of RAM as opposed to an empty piece of RAM is small.

Think before you post. The failure rate is constant in each memory chip (actually it goes up a bit with higher capacity due to higher density). Unless you setup the memory to be redundant (which the G5 can't do either...) you will experience MORE errors since a good OS tries to use the empty memory for things like file buffers.

How many of you are reading this from a desktop without ECC RAM that has an obnoxiously huge uptime? ECC is a non-issue in a well-cooled cluster of desktop cased machines.

Sigh... this is a 2200-cpu *cluster*. Here's a primer on statistics. Assume the probabiliy of a memory error is 0.01% for some time interval (say a week or month). The likelyhood for a perfect run is then 99.99% on your single CPU, which is just fine. Running on 2200 CPUs, the probability of not having any errors is 0.9999^2200=0.8, or 20% probability of getting memory-related errors somewhere in the cluster.

The actual numbers aren't important - it might very well be 0.01% probablility for an error per year, but the point is that when you run things in parallel the chance of getting a memory error *somewhere* is suddenly far from negligible.

ECC is a cheap and effective solution that almost eliminates the problem. Incidentally, one of the challenges for IBM with "Blue Gene" is that with their super-high memory density even normal single-bit ECC might not be enough.

But, what do I know - I've only got a PhD from Stanford and not VT....
Re:Clueless Sysadmins... by selderrr · 2003-09-08 19:26 · Score: 2, Interesting

that is useful only of those multiple cards are IN THE SAME MACHINE. In the cluster case, those cards are spread over multiple computers, requiring that you transfer the rendered result over the network to the "master" video card which sends it to the monitor. I seriously doubt the efficiency of such a solution for realtime display.

--
When will I end this grieving ? When will my future begin ?
Re:Clueless Sysadmins... by davechen · 2003-09-09 00:33 · Score: 3, Interesting

That's not necessarily true. Over at Stanford for the project they built a graphics system with 32 PCs that render to a tiled display. Imagine a display made of 1000 monitors in a 40x20 grid. That would be pretty freaking cool.