Troy+Baer · Slashdot Mirror

The article title is demonstrably wrong... on Big Mac achieves around 14 TFlops with 128 Nodes · 2003-10-16 07:31 · Score: 1

VA Tech must've gotten about 1.6 TFlop/s on 128 nodes. 14 TFlop/s is waaaaaay in excess of the peak for 128 dual G5 nodes (by almost an order of magnitude in fact).

Here's how I figure that:

A G5 proc can do 4 64-bit FP ops per clock cycle (2 FPUs with each capable of doing a multiply/add op), so that's 8 GFlops/s per 2GHz proc, or 16 GFlop/s per dual-proc node. For 128 nodes, that's a little over 2.048 TFlop/s peak. Dongerra said they were getting 80% of peak, which would be 1.638 TFlop/s.

The thing that I *still* haven't heard explained in all the reporting on the VA Tech G5 cluster is how they're scaling the interconnect out to 1100 nodes. The biggest InfiniBand switch you can get right now is 128 ports, so somehow they're going to have to slave a whole bunch of IB switches together. There's a lot of ways they could do that, but the cheapest way (with minimal bandwidth between switches) could really hurt them on big parallel codes like the Parallel Linpack Benchmark. It's even worse for codes like parallel FFTs that are bound by bisection bandwidth (where all the nodes are talking to all the other nodes at full bandwidth simultaneously). It's not clear to me if VA Tech is planning on putting in an IB network that preserves bisection bandwidth or not.

IB vs. Myrinet, Quadrics on Virginia Tech Announces Supercomputer Plans · 2003-09-03 09:05 · Score: 1

The numbers I've seen seem to suggest that a 128-host IB fabric is only slightly more expensive than a Myrinet fabric of the same size, with similar latency characteristics for MPI programs and about 3x the bandwidth per host. Quadrics has better latency characteristics than IB, but about half the per-host bandwidth; I'm not sure about relative cost.

What I'll be very curious to see is how Apple and VT scales an IB fabric out to 1100 hosts. They'll either have to buy a boatload of "backplane" switches or sacrifice bisection bandwidth across the system as a whole. That'll probably depend on whether they have big bisection-bandwidth-bound codes, like parallel FFTs...

Re:Call the FTC! on SCO: Code Proof Analyzed, Linus Interviewed · 2003-08-21 07:02 · Score: 1

I had a similar experience with the Ohio AG when I inquired if the SCO threat letters fell under Ohio's RICO laws. They basically said "They don't do business in Ohio, so talk to the feds."

Re:Binary version of Linux? on SCO Extorting Unixware Licenses to Linux Users? · 2003-07-22 03:26 · Score: 5, Insightful

I'm not sure if it's legal either, but it sure reeks of a protection racket to me. (It's especially galling given that they haven't even established in court that they do in fact own what they claim to.) I've complained to my state attourney general about it. I'd like to think they'll look into it, but my state AG is one of the ones who caved on the MS antitrust settlement...

The meat of the complaint appears to be pts. 50-55 on More on SCO vs. IBM Lawsuit · 2003-03-07 08:19 · Score: 2, Interesting

Here's the relevant section:

That's all well and good, but it blatantly ignores a couple hard truthes of the marketplace at the time:

The architecture of the Itanium is so different from x86 that SCO's knowledge of previous Intel architectures was either useless or an active hinderance to development.
By May 2001, it was obvious to anyone paying attention to that Linux was going to be the OS of choice among the early adopters of Itanium. A significant percentage (20% at least) of all Itanium-1 processors produced were going into compute clusters at academic HPC sites like NCSA and OSC (my employer), and those sites were all using Linux on their Itaniums. (NCSA bought their Itanium gear from IBM, IIRC.)

I think this suit is going to come down to SCO needing to prove the sentence italicized above, including identifying the trade secrets that IBM supposedly misappropriated. I have to think that's going to be extremely hard to prove, especially given that IBM wasn't even substantially involved in the Linux/ia64 port. (HP did most of the heavy lifting there, I think.)

Personally, I think Bruce Perens is right -- SCO is trying to get someone with deep pockets to buy them, whether that's IBM, MS, or somebody else.

Re:Future of Supercomputing on Forget Moore's Law? · 2003-02-11 03:47 · Score: 5, Informative

It's worth noting that the Earth Simulator is actually a cluster of vector mainframes (NEC SX-6s) using a custom interconnect. You could do something similar with the Cray X-1 if you had US$400M or so to spend.

If you're referring to the article I think you are, it was specifically talking in the context of weather simulation -- an application area where vector systems are known to excel (hence why the Earth Simulator does so well at it). The problem is that vector systems aren't always as cost-effective as clusters for a highly heterogeneous workload. With vector systems, a good deal of the cost is in the memory subsystem (often capable of several 10s of GB/s in memory bandwidth), but not every application needs heavy-duty memory bandwidth. Where I work, we've got benchmarks that show a cluster of Itanium-2 systems wiping the walls with a vector machine for some applications (specifically structural dynamics and some types of quantum chemistry calcuations), and others where a bunch of cheap AMDs beat everything in sight (on some bioinformatics stuff). It all depends on what your workload is.

Almost certainly 12ax7s... on THG Looks at ClawHammer Mobo · 2002-10-18 05:46 · Score: 2, Informative

12ax7s would certainly make sense, as they're still in production in several places (Russia, China, Yugoslavia) and thus relatively cheap. They're also widely used in preamps of guitar amplifiers, so you can find them at your local Guitar Center...

The EF86 was popular for hi-fi preamp applications like this in the '50s and '60s because they had lots of clean headroom, but they're not used as much any more because the ones still in production have a nasty habit of being microphonic. You'd also need twice as many of them, since they're a single pentode in roughly the same bottle as a 12ax7.

Write your congresscritters, folks! on Lofgren's Anti-DRM Bill · 2002-10-03 02:58 · Score: 3, Informative

If you want this bill (or something like it) passed, you have to let your House Rep. and Senators know that you consider it important. A short email or, better yet, snail-mail message will work wonders; here's the one I sent off to the Ohio congresscritters:

The Access Grid uses hardware to do this... on Software Based Echo Cancellation? · 2002-05-08 05:27 · Score: 3, Interesting

The Access Grid is a project started at Argonne National Lab's Math and Computer Science Division to build a mostly open videoconferencing system over the Internet, using multicast audio and video streaming. You may want to take a look at their technology to see if they have ideas you can use.

Anyway, a "node" on the Access Grid consists of a room with at least three computers: a multihead box running Win2k for display to several video projectors, a computer running Linux for audio capture and playback, and another running Linux for video capture. The audio capture machine usually runs into a Gentner AP400, which does echo cancellation as well as phone bridging.

I don't know of anybody who has software that does this; sorry.

Re:Advertizing? on Miyazaki's Future w/ Disney · 2001-08-25 00:56 · Score: 2, Insightful

I've never seen an ad for Mononoke (though, of course, I own the DVD). It's interesting that Disney's compaining about sales but has never really pushed the film....

Part of the problem with Disney's handling of Princess Mononoke was that they distributed it like they would an arthouse movie (i.e. relatively limited release, not much promotion), so that it was virtually guaranteed to not make money. Where I live, only the little hole-in-the-wall arthouse theaters showed it -- the big cineplexes didn't.

What happens is that the corps have multiple tiers of products, and they really only put their energy into promoting their top-tier stuff in terms of expected ROI. This is kinda circular -- most times something won't become really popular unless it's promoted, but it won't be promoted unless the corps think it'll be really popular... This also shows up in the music biz -- take a look at how Hollywood Records (the Mouse again) promoted Queen's last album before Freddie Mercury died (read: they didn't), or how Elektra promotes 2nd-tier but solid-selling bands like Dream Theater (read: they don't). In DT's case, they've actually had trouble convincing Elektra to let them make new records because they don't go gold in the US right away, even though they do huge business in Europe and Japan.

Re:Why should I _care_ about having the same memor on SGI Installs First Itanium Cluster At OSC · 2001-08-10 16:16 · Score: 1

But you cannot use your benchmarks to claim that P4 is faster than Athlon. Your benchmarks show that P4 with RDRAM is faster than Athlon with DDR DRAM. This does not imply that P4 is faster than Athlon, which is what you were trying to claim.

Actually, I was providing a counter-example to a specific comment of yours:

I can get Athlon-specific optimizations if I use the Portland Group compilers (-tp athlon -Mvect=prefetch), so claiming that "AMD has never had software optimized for their CPUs" is not entirely true. Certainly they're not well developed as the P4 optimizations in Intel's compilers, but they do exist.

Also, you cannot make such a claim based on only one benchmark.

I have a hard time viewing the seven different codes in the NAS Parallel Benchmarks as "one benchmark"; the algorithms in the CG (conjugate gradient) benchmark for instance are very different from those in FT (Fourier transform) or IS (integer sort), and they stress different parts of the architecture. They do all favor high memory bandwidth systems, but that's the nature of the beast in HPC -- almost all scientific applications are memory bandwidth hungry.

Why should I _care_ about having the same memory? on SGI Installs First Itanium Cluster At OSC · 2001-08-10 13:38 · Score: 1

OK, I was not aware of that, but tell me where I can actually buy them. I looked at Dell's site before I posted. If Dell doesn't have them, nobody else does.

Had you looked a little further at the URL I posted, it had a link to the company we bought both systems from, Ace Computers. Dell is dragging their feet on dual P4s for some reason; we tried to buy a dual P4 system from them, but they wouldn't ship us one until October and couldn't explain the delay. Smaller vendors like Ace were much more helpful.

You pointed to the single (type of) benchmark that makes P4 look good, though not because of P4's virtues. P4 has two channels of RDRAM giving it 1600 * 2 = 3200 MB/s of memory bandwidth. The Athlon machine in the benchmark you pointed to had a single channel of DDR DRAM, giving it 2128MB/s of memory bandwidth. Fluid Dynamic simulation is one of the _few_ things that can really take advantage of greater memory bandwidth. Therefore P4 wins this round by riding on Rambus's coat-tails.

And I suppose the fact that width of the P4's memory bus interface (256 bits wide) is twice as wide as that of the Athlon (128 bits wide) has no influence on the memory bandwidth available? Or that I can't get an optimizing compiler for the Athlon that's comparable to the Intel Fortran compiler?

Prove me wrong. Point me to the same benchmark where both P4 and Athlon have the same type of memory (both support SDRAM now, you know).

Look, I don't have the luxury of caring which processor is better in a "fair" test with the same memory, etc. -- my job is to figure out which system (processor/memory/IO/etc.) is fastest for our users' applications, which happen to look a lot like NPB. We got one each of the "current best of breed" in dual P4s and dual Athlons, ran the same codes on each, and assembled the results. For single-processor stuff, the P4 wins, sometimes by a lot. For I/O bound stuff, the Athlon wins right now, mainly because Intel couldn't design a decent PCI bridge to save their lives. (This is a matter of the sorta-broken [Athlon] beating the completely-broken [P4]; I fully expect a dual P4 board with a ServerWorks chipset to hand the Athlon its butt.)

I'll be the first to admit that my benchmarks comparison page needs work -- I haven't had time to run Gaussian or GAMESS (computational chemistry codes) yet, or BLAST (a bioinformatics code) for that matter. I know of a similar study that had completely different results, I think largely because they couldn't use the Intel compilers due to some language extensions used by their codes (eg. Cray-style pointers) that the Intel compilers don't support.

You want numbers, here're some numbers... on SGI Installs First Itanium Cluster At OSC · 2001-08-10 07:29 · Score: 1

I was hoping to avoid this pissing match, but these claims are too ridiculous to let pass.

If software was optimized for P4 and Athlon, P4 would still be pathetic.

The NAS Parallel Benchmarks would seem to indicate you are wrong.

A Xeon is based on P3 core. Xeon is just an overpriced P3 with a larger L2 cache. My claim stands: there is no such thing as dual P4. If and when Xeons with P4 core are relased, then we'll talk.

P4 Xeons have been available for several weeks now; we've had a dual P4 machine on site for about a month. Here's the /proc/cpuinfo from it:
processor : 0 vendor_id : GenuineIntel cpu family : 15 model : 0 model name : Intel(R) Xeon(TM) CPU 1700MHz stepping : 10 cpu MHz : 1695.874 cache size : 256 KB fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 2 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss tm bogomips : 3381.65 processor : 1 vendor_id : GenuineIntel cpu family : 15 model : 0 model name : Intel(R) Xeon(TM) CPU 1700MHz stepping : 10 cpu MHz : 1695.874 cache size : 256 KB fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 2 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss tm bogomips : 3381.65

You really ought to check your facts...

Re:How Much Power and A/C for this Itanium Cluster on SGI Installs First Itanium Cluster At OSC · 2001-08-10 02:41 · Score: 1

OSC's Itanium cluster is physically located at the State of Ohio Computing Center (SOCC), which provides all the power and cooling. The power for the cluster is nineteen 30A/220V circuits, which go to a set of Sentry power controllers. The total heat load is about 195k BTU/hr, but the SOCC building is (over-)engineered such that this much of a heat load did not require additional AC capacity.

Available to anybody with an account there? on SGI Installs First Itanium Cluster At OSC · 2001-08-10 01:13 · Score: 1

OSC had beta Itanium hardware last year as well. The difference is that this machine doesn't require signing a bunch of NDAs to use it.

NCSA has a slightly larger system of (more or less) the same design on order from IBM, and I think the installation is ongoing right now.

--Troy

Not exactly... (Re:It is worth noting...) on SGI Installs First Itanium Cluster At OSC · 2001-08-10 01:07 · Score: 2, Insightful

The person who submitted the story (Troy Baer) is also the admin of the beast.

To give credit where credit is due, the admin of that system is Doug Johnson, who has done an enormous amount of work to get this thing working. I'm just a user support guy who writes lots of documentation and happens to dabble in systems stuff like Maui and PVFS in my Copious Spare Time[tm].

Re:The *first* Itanium cluster?!? on SGI Installs First Itanium Cluster At OSC · 2001-08-10 01:02 · Score: 1

If CSAR had it before June of this year, it was prerelease, NDA-ridden, not-ready-for-prime-time hardware that was not supposed to be available for public use. (OSC has had a few Itanium boxes in house since September as well, but we didn't publicly advertise that fact since they were test systems.) OSC's machine is the first cluster of Itaniums that you don't have to sign a ton of NDAs with Intel to use.

I did read the PDF in that URL, BTW, and I'd love to know how you got SGI and Intel to extend the NDAs to CSAR's entire userbase -- we asked SGI about this at one point, and they just about had kittens.

Re:Cplant vs Bproc on "Cplant" Parallel Computing Tool · 2001-06-08 04:27 · Score: 1

Well, there's the concept of Beowulf clusters (free/open source software plus dedicated commodity hardware), and then there's the Beowulf software from Scyld. Just because someone's not using Bproc doesn't mean it's not a Beowulf cluster, and just because it works for your 10 node cluster doesn't mean it scales to the 1400 nodes in Cplant. I wouldn't go so far as to call Bproc a standard, either. MPI is certainly the standard for message passing (and my understanding is that the Cplant stuff supports MPI), but Bproc? Bproc isn't that much different from arrayd on SGI systems or RMS on the Compaq Alphaserver SC... or yod on Cplant, for that matter. It's just a way to start and control processes.

The Scyld Beowulf software is very nice for quickly setting up small to medium size clusters where users use the whole cluster more-or-less serially. IMHO, it doesn't fare quite so well for production oriented shops like Sandia, where things like accounting and scheduling become important. The Scyld software also has very limited support for Myrinet, which is a very nice (and very fast) interconnect for clusters.

You also need to remember that the Cplant stuff was specifically designed to emulate the user environment of the ASCI Red machine, which inherited its environment from Sandia's Paragon. That was done presumably to keep the retraining of Sandia's user base to a minimum. The Scyld software has no such requirements.

(Disclaimer: One of my coworkers used to work on Cplant, and we've borrowed some of Cplant's ideas [though not any of the software] for the clusters we have at OSC.)

Sun and GNOME (was Re: Motif *is* dead.) on The Superior Motif? · 2001-05-23 01:36 · Score: 1

NONE of the major Unix vendors (Sun,HP,Compaq,IBM) have switched ANYTHING to GNOME. Sun and HP have announced they will provide GNOME 2.0 when it is released and STABLE.

<sarcasm> So that's why the Sunrays attached to the loaner Daktari system Sun had in here last week had GNOME 1.2 on them...</sarcasm>

News flash: Sun is shipping GNOME NOW.

He's a professional writer who's also a nut... on Harlan Ellison on Copyright Infringement · 2001-03-07 10:40 · Score: 2

Wow, that rant was about the worst thing one could expect from a professional writer.
What happened to eloquence and subtlety?
What happened to NOT drowning out one's own point by using all caps??
What happened to having one's own lawyer review the contents of the release to assure effectiveness??

Haven't read any of his work, have you? Harlan Ellison is about as subtle as the Death Star most of the time, and his only competition for "Most Opinionated Human on the Planet" is George Carlin. Stephen King once described the forward to Harlan's book Strange Wine as making him

*Danse Macabre, 1981.

I stopped taking Harlan seriously after seeing three or four his rants on an "entertainment news" show that the SciFi Channel used to have on. Don't get me wrong, he does good work -- the "City on the Edge of Forever" episode of Star Trek was frickin' brilliant, as were some of the ideas he gave JMS for Babylon 5 -- but IMHO he's kind of a loon.

IA64 (was Re:perhaps there's a reason) on The Silent Kernel Platform War? · 2001-02-13 05:28 · Score: 1

aside from the people who wrote it (and therefore already have a more up-to-date tree than Linus) and are prototyping the silicon, who actually has an IA64 system?

OSC, for one. NCSA, for another. Most of the larger vendors (IBM, HP, Dell, Compaq, SGI, etc.) have a few as well. Plus there's a software simulator from HP that lets you run IA64 programs (including the Linux kernel) on an IA32 box.

They exist, though admittedly only as engineering samples right now. But people do have them.

Maui Scheduler has a simulation mode... on Resources On Practical Job Scheduling? · 2001-02-08 01:42 · Score: 1

Maui Scheduler is a open source scheduler for batch systems like OpenPBS and IBM's Loadleveler. It started life as a way to get around the braindead FIFO behavior in the default Loadleveler scheduler.

Anyway, it has a simulation mode where you can feed it jobs (including ones with dependencies) and it will simulate running them. It's a handy way for checking your maui.cfg for pathological cases. :)

By what math is this the largest Linux cluster? on Shell and the World's largest Linux Supercomputer · 2000-12-11 23:30 · Score: 2

AFAIK, CPlant at Sandia is ~1600 Alpha nodes and counting. I think one of the genetic research companies has a 1000+ Intel node cluster (although it's used as a job farm rather than for parallel applications, IIRC).

--Troy

T3E != cluster (was Re:cray os) on Shell and the World's largest Linux Supercomputer · 2000-12-11 23:25 · Score: 1

The latest biggest Cray machines have all been Alpha clusters probably running some clusterable DU variant.

I think you mean the T3E, and while it's based on Alphas, it's about as far from a cluster as you can get and still have a parallel machine. It also runs UNICOS/mk (a microkernel version of UNICOS), not Digital/Tru64/name-of-the-week Unix.

Already being tested... on IBM Itanium Based Systems and Linux · 2000-12-06 04:56 · Score: 1

See this link. SGI had a cluster of 8 dual Itanium systems with Myrinet on the floor of Supercomputing 2000, last month. I know because my code was one of the ones they were demonstrating on it; they've loaned us (OSC) 4 dual Itanium boxes and Myrinet to do porting and development on.

My guess (given there's almost nothing to go in in the article) is that IBM will be selling the same Itanium workstation chassis that SGI, Dell, and everybody else will be.

Slashdot Mirror

User: Troy+Baer

Comments · 190