VA Tech must've gotten about 1.6 TFlop/s on 128 nodes. 14 TFlop/s is waaaaaay in excess of the peak for 128 dual G5 nodes (by almost an order of magnitude in fact).
Here's how I figure that:
A G5 proc can do 4 64-bit FP ops per clock cycle (2 FPUs with each capable of doing a multiply/add op), so that's 8 GFlops/s per 2GHz proc, or 16 GFlop/s per dual-proc node. For 128 nodes, that's a little over 2.048 TFlop/s peak. Dongerra said they were getting 80% of peak, which would be 1.638 TFlop/s.
The thing that I *still* haven't heard explained in all the reporting on the VA Tech G5 cluster is how they're scaling the interconnect out to 1100 nodes. The biggest InfiniBand switch you can get right now is 128 ports, so somehow they're going to have to slave a whole bunch of IB switches together. There's a lot of ways they could do that, but the cheapest way (with minimal bandwidth between switches) could really hurt them on big parallel codes like the Parallel Linpack Benchmark. It's even worse for codes like parallel FFTs that are bound by bisection bandwidth (where all the nodes are talking to all the other nodes at full bandwidth simultaneously). It's not clear to me if VA Tech is planning on putting in an IB network that preserves bisection bandwidth or not.
I can't seem to find the quote from any of the articles right now, but VT is planning on using an Infiniband interconnect from Mellanox. While I don't know the relative price points, they are touting the fact that this is a high-speed interconnect that's faster than Myrinet or Quadrics at a fraction of the cost. I can't say for sure, since the Infiniband cluster we're helping to build at Stanford is not yet assembled.
The numbers I've seen seem to suggest that a 128-host IB fabric is only slightly more expensive than a Myrinet fabric of the same size, with similar latency characteristics for MPI programs and about 3x the bandwidth per host. Quadrics has better latency characteristics than IB, but about half the per-host bandwidth; I'm not sure about relative cost.
What I'll be very curious to see is how Apple and VT scales an IB fabric out to 1100 hosts. They'll either have to buy a boatload of "backplane" switches or sacrifice bisection bandwidth across the system as a whole. That'll probably depend on whether they have big bisection-bandwidth-bound codes, like parallel FFTs...
I did this last week as well as the AG in NY, Eliot Spitzer and my home state AG. I received a reply from Utah yesterday. The gist of the response was they viewed it as a Federal copyright law issue and not in their domain, it was up to the courts to decide or congress to change the copyright laws. They suggested I contact a congressman.
I had a similar experience with the Ohio AG when I inquired if the SCO threat letters fell under Ohio's RICO laws. They basically said "They don't do business in Ohio, so talk to the feds."
They are still claiming that the Linux kernel (or whatever part of SCO/Linux they are claiming today) contains their code, and that it is being used illegally, however if you give them money then they will ignore your violation. I'm not convinced that this is legal, since it sounds a lot like blackmail to me, but that doesn't seem to stop SCO.
I'm not sure if it's legal either, but it sure reeks of a protection racket to me. (It's especially galling given that they haven't even established in court that they do in fact own what they claim to.) I've complained to my state attourney general about it. I'd like to think they'll look into it, but my state AG is one of the ones who caved on the MS antitrust settlement...
50. As SCO was poised and ready to expand its market and market share for UnixWare targeted to high-performance enterprise customers, IBM approached SCO to jointly develop a new 64-bit UNIX-based operating system for Intel-based processing platforms. This joint development effort was widely known as Project Monterey.
51. Prior to this time, IBM had not developed any expertise to run UNIX on an Intel chip and instead was confined to its Power PC chip.
52. In furtherance of Project Monterey, SCO expended substantial amounts of money and dedicated a significant portion of SCO's development team to completion of the project.
53. Specifically, plaintiff and plaintiff's predecessor provided IBM engineers with valuable information and trade secrets with respect to architecture, schematics, and design of UnixWare and the UNIX Software Code for Intel-based processors.
54. By about May 2001, all technical aspects of Project Monterey had been substantially completed. The only remaining tasks of Project Monterey involved marketing and branding tasks to be performed substantially by IBM.
55. On or about May 2001, IBM notified plaintiff that it refused to proceed with Project Monterey, and that IBM considered Project Monterey to be "dead." In fact, in violation of its obligations to SCO, IBM chose to use and appropriate for its own business the proprietary information obtained from SCO. (emphasis mine)
That's all well and good, but it blatantly ignores a couple hard truthes of the marketplace at the time:
The architecture of the Itanium is so different from x86 that SCO's knowledge of previous Intel architectures was either useless or an active hinderance to development.
By May 2001, it was obvious to anyone paying attention to that Linux was going to be the OS of choice among the early adopters of Itanium. A significant percentage (20% at least) of all Itanium-1 processors produced were going into compute clusters at academic HPC sites like NCSA and OSC (my employer), and those sites were all using Linux on their Itaniums. (NCSA bought their Itanium gear from IBM, IIRC.)
I think this suit is going to come down to SCO needing to prove the sentence italicized above, including identifying the trade secrets that IBM supposedly misappropriated. I have to think that's going to be extremely hard to prove, especially given that IBM wasn't even substantially involved in the Linux/ia64 port. (HP did most of the heavy lifting there, I think.)
Personally, I think Bruce Perens is right -- SCO is trying to get someone with deep pockets to buy them, whether that's IBM, MS, or somebody else.
Re:Future of Supercomputing
on
Forget Moore's Law?
·
· Score: 5, Informative
Check out who's on top of the TOP 500 supercomputers. US? Nope. Cluster? Nope. The top computer in the world is the Earth Simulator in Japan. It's not a cluster of lower end processors. It was built from the ground up with one idea -- speed. Unsurprisingly it uses traditional vector processing techniques developed by Cray to achieve this power. And how does it compare with the next in line? It blows them away. Absolutely blows them away.
It's worth noting that the Earth Simulator is actually a cluster of vector mainframes (NEC SX-6s) using a custom interconnect. You could do something similar with the Cray X-1 if you had US$400M or so to spend.
I recently read a very interesting article about this (I can't remember where - I tried googling) which basically stated that the US has lost it's edge in supercomputing. The reason was two fold: (1) less government and private funding for supercomputing projects and (2) a reliance on clustering.
If you're referring to the article I think you are, it was specifically talking in the context of weather simulation -- an application area where vector systems are known to excel (hence why the Earth Simulator does so well at it). The problem is that vector systems aren't always as cost-effective as clusters for a highly heterogeneous workload. With vector systems, a good deal of the cost is in the memory subsystem (often capable of several 10s of GB/s in memory bandwidth), but not every application needs heavy-duty memory bandwidth. Where I work, we've got benchmarks that show a cluster of Itanium-2 systems wiping the walls with a vector machine for some applications (specifically structural dynamics and some types of quantum chemistry calcuations), and others where a bunch of cheap AMDs beat everything in sight (on some bioinformatics stuff). It all depends on what your workload is.
Since six channels are being amplified (5.1) and three tubes are present, I'm assuming they're using three double-triodes in Class A configuration. Maybe 12AX7s? Note to AOpen: people care about this kind of thing.
12ax7s would certainly make sense, as they're still in production in several places (Russia, China, Yugoslavia) and thus relatively cheap. They're also widely used in preamps of guitar amplifiers, so you can find them at your local Guitar Center...
The EF86 was popular for hi-fi preamp applications like this in the '50s and '60s because they had lots of clean headroom, but they're not used as much any more because the ones still in production have a nasty habit of being microphonic. You'd also need twice as many of them, since they're a single pentode in roughly the same bottle as a 12ax7.
If you want this bill (or something like it) passed, you have to let your House Rep. and Senators know that you consider it important. A short email or, better yet, snail-mail message will work wonders; here's the one I sent off to the Ohio congresscritters:
To: senator_DeWine@DeWine.senate.gov, senator_voinovich@exchange.senate.gov, pryce.oh15@mail.house.gov
Subject: Lofgren bill
Honorable Senators and Congresswoman:
I am writing you today to express my support for a bill recently introduced into the House by Congresswoman Zoe Lofgren (which unfortunately does not yet have a congressional record number that I can find), entitled "The Digital Choice and Freedom Act of 2002". This bill would modify the copyright laws to reinforce consumers' fair
use rights, which were eroded by the Digital Millenium Copyright Act (DMCA) passed in 1998.
This legislation is desparately needed. Copyright law in recent years has been heavily tilted in favor of copyright holders, and fair use rights and the public domain have suffered as a result. This act would return some semblance of sanity to some of the more draconian
aspects of the DMCA, which disallows circumvention of access controls
on copyrighted digital works even if that circumvention is needed to make fair use of the work. (This means, for instance, that it is
potentially illegal to develop and distribute an open source DVD player for Linux, because any such player must circumvent the access
controls built into the DVD format. This leads to the absurd situation where I can legally buy a DVD and a DVD player/drive for a computer running Linux, but I may not legally be able to play the DVD on the computer.)
Please lend your support to Congresswoman Lofgren's bill. Thank you for your time and attention in the matter.
The Access Grid is a project started at Argonne National Lab's Math and Computer Science Division to build a mostly open videoconferencing system over the Internet, using multicast audio and video streaming. You may want to take a look at their technology to see if they have ideas you can use.
Anyway, a "node" on the Access Grid consists of a room with at least three computers: a multihead box running Win2k for display to several video projectors, a computer running Linux for audio capture and playback, and another running Linux for video capture. The audio capture machine usually runs into a Gentner AP400, which does echo cancellation as well as phone bridging.
I don't know of anybody who has software that does this; sorry.
I've never seen an ad for Mononoke (though, of course, I own the DVD). It's interesting that Disney's compaining about sales but has never really pushed the film....
Part of the problem with Disney's handling of Princess Mononoke was that they distributed it like they would an arthouse movie (i.e. relatively limited release, not much promotion), so that it was virtually guaranteed to not make money. Where I live, only the little hole-in-the-wall arthouse theaters showed it -- the big cineplexes didn't.
What happens is that the corps have multiple tiers of products, and they really only put their energy into promoting their top-tier stuff in terms of expected ROI. This is kinda circular -- most times something won't become really popular unless it's promoted, but it won't be promoted unless the corps think it'll be really popular... This also shows up in the music biz -- take a look at how Hollywood Records (the Mouse again) promoted Queen's last album before Freddie Mercury died (read: they didn't), or how Elektra promotes 2nd-tier but solid-selling bands like Dream Theater (read: they don't). In DT's case, they've actually had trouble convincing Elektra to let them make new records because they don't go gold in the US right away, even though they do huge business in Europe and Japan.
But you cannot use your benchmarks to claim that P4 is faster than Athlon. Your benchmarks show that P4 with RDRAM is faster than Athlon with DDR DRAM. This does not imply that P4 is faster than Athlon, which is what you were trying to claim.
Actually, I was providing a counter-example to a specific comment of yours:
If software was optimized for P4 and Athlon, P4 would still be pathetic.
I can get Athlon-specific optimizations if I use the Portland Group compilers (-tp athlon -Mvect=prefetch), so claiming that "AMD has never had software optimized for their CPUs" is not entirely true. Certainly they're not well developed as the P4 optimizations in Intel's compilers, but they do exist.
Also, you cannot make such a claim based on only one benchmark.
I have a hard time viewing the seven different codes in the NAS Parallel Benchmarks as "one benchmark"; the algorithms in the CG (conjugate gradient) benchmark for instance are very different from those in FT (Fourier transform) or IS (integer sort), and they stress different parts of the architecture. They do all favor high memory bandwidth systems, but that's the nature of the beast in HPC -- almost all scientific applications are memory bandwidth hungry.
OK, I was not aware of that, but tell me where I can actually buy them. I looked at Dell's site before I posted. If Dell doesn't have them, nobody else does.
Had you looked a little further at the URL I posted, it had a link to the company we bought both systems from, Ace Computers. Dell is dragging their feet on dual P4s for some reason; we tried to buy a dual P4 system from them, but they wouldn't ship us one until October and couldn't explain the delay. Smaller vendors like Ace were much more helpful.
You pointed to the single (type of) benchmark that makes P4 look good, though not because of P4's virtues. P4 has two channels of RDRAM giving it 1600 * 2 = 3200 MB/s of memory bandwidth. The Athlon machine in the benchmark you pointed to had a single channel of DDR DRAM, giving it 2128MB/s of memory bandwidth. Fluid Dynamic simulation is one of the _few_ things that can really take advantage of greater memory bandwidth. Therefore P4 wins this round by riding on Rambus's coat-tails.
And I suppose the fact that width of the P4's memory bus interface (256 bits wide) is twice as wide as that of the Athlon (128 bits wide) has no influence on the memory bandwidth available? Or that I can't get an optimizing compiler for the Athlon that's comparable to the Intel Fortran compiler?
Prove me wrong. Point me to the same benchmark where both P4 and Athlon have the same type of memory (both support SDRAM now, you know).
Look, I don't have the luxury of caring which processor is better in a "fair" test with the same memory, etc. -- my job is to figure out which system (processor/memory/IO/etc.) is fastest for our users' applications, which happen to look a lot like NPB. We got one each of the "current best of breed" in dual P4s and dual Athlons, ran the same codes on each, and assembled the results. For single-processor stuff, the P4 wins, sometimes by a lot. For I/O bound stuff, the Athlon wins right now, mainly because Intel couldn't design a decent PCI bridge to save their lives. (This is a matter of the sorta-broken [Athlon] beating the completely-broken [P4]; I fully expect a dual P4 board with a ServerWorks chipset to hand the Athlon its butt.)
I'll be the first to admit that my benchmarks comparison page needs work -- I haven't had time to run Gaussian or GAMESS (computational chemistry codes) yet, or BLAST (a bioinformatics code) for that matter. I know of a similar study that had completely different results, I think largely because they couldn't use the Intel compilers due to some language extensions used by their codes (eg. Cray-style pointers) that the Intel compilers don't support.
A Xeon is based on P3 core. Xeon is just an overpriced P3 with a larger L2 cache. My claim stands: there is no such thing as dual P4. If and when Xeons with P4 core are relased, then we'll talk.
P4 Xeons have been available for several weeks now; we've had a dual P4 machine on site for about a month. Here's the/proc/cpuinfo from it:
processor : 0
vendor_id : GenuineIntel
cpu family : 15
model : 0
model name : Intel(R) Xeon(TM) CPU 1700MHz
stepping : 10
cpu MHz : 1695.874
cache size : 256 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 2
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss tm
bogomips : 3381.65
processor : 1
vendor_id : GenuineIntel
cpu family : 15
model : 0
model name : Intel(R) Xeon(TM) CPU 1700MHz
stepping : 10
cpu MHz : 1695.874
cache size : 256 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 2
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss tm
bogomips : 3381.65
OSC's Itanium cluster is physically located at the State of Ohio Computing Center (SOCC), which provides all the power and cooling. The power for the cluster is nineteen 30A/220V circuits, which go to a set of Sentry power controllers. The total heat load is about 195k BTU/hr, but the SOCC building is (over-)engineered such that this much of a heat load did not require additional AC capacity.
The person who submitted the story (Troy Baer) is also the admin of the beast.
To give credit where credit is due, the admin of that system is Doug Johnson, who has done an enormous amount of work to get this thing working. I'm just a user support guy who writes lots of documentation and happens to dabble in systems stuff like Maui and PVFS in my Copious Spare Time[tm].
If CSAR had it before June of this year, it was prerelease, NDA-ridden, not-ready-for-prime-time hardware that was not supposed to be available for public use. (OSC has had a few Itanium boxes in house since September as well, but we didn't publicly advertise that fact since they were test systems.) OSC's machine is the first cluster of Itaniums that you don't have to sign a ton of NDAs with Intel to use.
I did read the PDF in that URL, BTW, and I'd love to know how you got SGI and Intel to extend the NDAs to CSAR's entire userbase -- we asked SGI about this at one point, and they just about had kittens.
Well, there's the concept of Beowulf clusters (free/open source software plus dedicated commodity hardware), and then there's the Beowulf software from Scyld. Just because someone's not using Bproc doesn't mean it's not a Beowulf cluster, and just because it works for your 10 node cluster doesn't mean it scales to the 1400 nodes in Cplant. I wouldn't go so far as to call Bproc a standard, either. MPI is certainly the standard for message passing (and my understanding is that the Cplant stuff supports MPI), but Bproc? Bproc isn't that much different from arrayd on SGI systems or RMS on the Compaq Alphaserver SC... or yod on Cplant, for that matter. It's just a way to start and control processes.
The Scyld Beowulf software is very nice for quickly setting up small to medium size clusters where users use the whole cluster more-or-less serially. IMHO, it doesn't fare quite so well for production oriented shops like Sandia, where things like accounting and scheduling become important. The Scyld software also has very limited support for Myrinet, which is a very nice (and very fast) interconnect for clusters.
You also need to remember that the Cplant stuff was specifically designed to emulate the user environment of the ASCI Red machine, which inherited its environment from Sandia's Paragon. That was done presumably to keep the retraining of Sandia's user base to a minimum. The Scyld software has no such requirements.
(Disclaimer: One of my coworkers used to work on Cplant, and we've borrowed some of Cplant's ideas [though not any of the software] for the clusters we have at OSC.)
--Troy
Sun and GNOME (was Re: Motif *is* dead.)
on
The Superior Motif?
·
· Score: 1
NONE of the major Unix vendors (Sun,HP,Compaq,IBM) have switched ANYTHING to GNOME. Sun and HP have announced they will provide GNOME 2.0 when it is released and STABLE.
<sarcasm> So that's why the Sunrays attached to the loaner Daktari system Sun had in here last week had GNOME 1.2 on them...</sarcasm>
Wow, that rant was about the worst thing one could expect from a professional writer.
What happened to eloquence and subtlety?
What happened to NOT drowning out one's own point by using all caps??
What happened to having one's own lawyer review the contents of the release to assure effectiveness??
Haven't read any of his work, have you? Harlan Ellison is about as subtle as the Death Star most of the time, and his only competition for "Most Opinionated Human on the Planet" is George Carlin. Stephen King once described the forward to Harlan's book Strange Wine as making him
suspect I was experiencing something roughly similar to a six-hour rant delivered by Fidel Castro. Always assuming that Fidel was really on that day.*
*Danse Macabre, 1981.
I stopped taking Harlan seriously after seeing three or four his rants on an "entertainment news" show that the SciFi Channel used to have on. Don't get me wrong, he does good work -- the "City on the Edge of Forever" episode of Star Trek was frickin' brilliant, as were some of the ideas he gave JMS for Babylon 5 -- but IMHO he's kind of a loon.
aside from the people who wrote it (and therefore already have a more up-to-date tree than Linus) and are prototyping the silicon, who actually has an IA64 system?
OSC, for one. NCSA, for another. Most of the larger vendors (IBM, HP, Dell, Compaq, SGI, etc.) have a few as well. Plus there's a software simulator from HP that lets you run IA64 programs (including the Linux kernel) on an IA32 box.
They exist, though admittedly only as engineering samples right now. But people do have them.
Maui Scheduler is a open source scheduler for batch systems like OpenPBS and IBM's Loadleveler. It started life as a way to get around the braindead FIFO behavior in the default Loadleveler scheduler.
Anyway, it has a simulation mode where you can feed it jobs (including ones with dependencies) and it will simulate running them. It's a handy way for checking your maui.cfg for pathological cases.:)
AFAIK, CPlant at Sandia is ~1600 Alpha nodes and counting. I think one of the genetic research companies has a 1000+ Intel node cluster (although it's used as a job farm rather than for parallel applications, IIRC).
The latest biggest Cray machines have all been Alpha clusters probably running some clusterable DU variant.
I think you mean the T3E, and while it's based on Alphas, it's about as far from a cluster as you can get and still have a parallel machine. It also runs UNICOS/mk (a microkernel version of UNICOS), not Digital/Tru64/name-of-the-week Unix.
See this link. SGI had a cluster of 8 dual Itanium systems with Myrinet on the floor of Supercomputing 2000, last month. I know because my code was one of the ones they were demonstrating on it; they've loaned us (OSC) 4 dual Itanium boxes and Myrinet to do porting and development on.
My guess (given there's almost nothing to go in in the article) is that IBM will be selling the same Itanium workstation chassis that SGI, Dell, and everybody else will be.
Here's how I figure that:
A G5 proc can do 4 64-bit FP ops per clock cycle (2 FPUs with each capable of doing a multiply/add op), so that's 8 GFlops/s per 2GHz proc, or 16 GFlop/s per dual-proc node. For 128 nodes, that's a little over 2.048 TFlop/s peak. Dongerra said they were getting 80% of peak, which would be 1.638 TFlop/s.
The thing that I *still* haven't heard explained in all the reporting on the VA Tech G5 cluster is how they're scaling the interconnect out to 1100 nodes. The biggest InfiniBand switch you can get right now is 128 ports, so somehow they're going to have to slave a whole bunch of IB switches together. There's a lot of ways they could do that, but the cheapest way (with minimal bandwidth between switches) could really hurt them on big parallel codes like the Parallel Linpack Benchmark. It's even worse for codes like parallel FFTs that are bound by bisection bandwidth (where all the nodes are talking to all the other nodes at full bandwidth simultaneously). It's not clear to me if VA Tech is planning on putting in an IB network that preserves bisection bandwidth or not.
The numbers I've seen seem to suggest that a 128-host IB fabric is only slightly more expensive than a Myrinet fabric of the same size, with similar latency characteristics for MPI programs and about 3x the bandwidth per host. Quadrics has better latency characteristics than IB, but about half the per-host bandwidth; I'm not sure about relative cost.
What I'll be very curious to see is how Apple and VT scales an IB fabric out to 1100 hosts. They'll either have to buy a boatload of "backplane" switches or sacrifice bisection bandwidth across the system as a whole. That'll probably depend on whether they have big bisection-bandwidth-bound codes, like parallel FFTs...
I'm not sure if it's legal either, but it sure reeks of a protection racket to me. (It's especially galling given that they haven't even established in court that they do in fact own what they claim to.) I've complained to my state attourney general about it. I'd like to think they'll look into it, but my state AG is one of the ones who caved on the MS antitrust settlement...
Here's the relevant section:
That's all well and good, but it blatantly ignores a couple hard truthes of the marketplace at the time:
- The architecture of the Itanium is so different from x86 that SCO's knowledge of previous Intel architectures was either useless or an active hinderance to development.
- By May 2001, it was obvious to anyone paying attention to that Linux was going to be the OS of choice among the early adopters of Itanium. A significant percentage (20% at least) of all Itanium-1 processors produced were going into compute clusters at academic HPC sites like NCSA and OSC (my employer), and those sites were all using Linux on their Itaniums. (NCSA bought their Itanium gear from IBM, IIRC.)
I think this suit is going to come down to SCO needing to prove the sentence italicized above, including identifying the trade secrets that IBM supposedly misappropriated. I have to think that's going to be extremely hard to prove, especially given that IBM wasn't even substantially involved in the Linux/ia64 port. (HP did most of the heavy lifting there, I think.)Personally, I think Bruce Perens is right -- SCO is trying to get someone with deep pockets to buy them, whether that's IBM, MS, or somebody else.
It's worth noting that the Earth Simulator is actually a cluster of vector mainframes (NEC SX-6s) using a custom interconnect. You could do something similar with the Cray X-1 if you had US$400M or so to spend.
If you're referring to the article I think you are, it was specifically talking in the context of weather simulation -- an application area where vector systems are known to excel (hence why the Earth Simulator does so well at it). The problem is that vector systems aren't always as cost-effective as clusters for a highly heterogeneous workload. With vector systems, a good deal of the cost is in the memory subsystem (often capable of several 10s of GB/s in memory bandwidth), but not every application needs heavy-duty memory bandwidth. Where I work, we've got benchmarks that show a cluster of Itanium-2 systems wiping the walls with a vector machine for some applications (specifically structural dynamics and some types of quantum chemistry calcuations), and others where a bunch of cheap AMDs beat everything in sight (on some bioinformatics stuff). It all depends on what your workload is.
12ax7s would certainly make sense, as they're still in production in several places (Russia, China, Yugoslavia) and thus relatively cheap. They're also widely used in preamps of guitar amplifiers, so you can find them at your local Guitar Center...
The EF86 was popular for hi-fi preamp applications like this in the '50s and '60s because they had lots of clean headroom, but they're not used as much any more because the ones still in production have a nasty habit of being microphonic. You'd also need twice as many of them, since they're a single pentode in roughly the same bottle as a 12ax7.
If you want this bill (or something like it) passed, you have to let your House Rep. and Senators know that you consider it important. A short email or, better yet, snail-mail message will work wonders; here's the one I sent off to the Ohio congresscritters:
The Access Grid is a project started at Argonne National Lab's Math and Computer Science Division to build a mostly open videoconferencing system over the Internet, using multicast audio and video streaming. You may want to take a look at their technology to see if they have ideas you can use.
Anyway, a "node" on the Access Grid consists of a room with at least three computers: a multihead box running Win2k for display to several video projectors, a computer running Linux for audio capture and playback, and another running Linux for video capture. The audio capture machine usually runs into a Gentner AP400, which does echo cancellation as well as phone bridging.
I don't know of anybody who has software that does this; sorry.
I've never seen an ad for Mononoke (though, of course, I own the DVD). It's interesting that Disney's compaining about sales but has never really pushed the film....
Part of the problem with Disney's handling of Princess Mononoke was that they distributed it like they would an arthouse movie (i.e. relatively limited release, not much promotion), so that it was virtually guaranteed to not make money. Where I live, only the little hole-in-the-wall arthouse theaters showed it -- the big cineplexes didn't.
What happens is that the corps have multiple tiers of products, and they really only put their energy into promoting their top-tier stuff in terms of expected ROI. This is kinda circular -- most times something won't become really popular unless it's promoted, but it won't be promoted unless the corps think it'll be really popular... This also shows up in the music biz -- take a look at how Hollywood Records (the Mouse again) promoted Queen's last album before Freddie Mercury died (read: they didn't), or how Elektra promotes 2nd-tier but solid-selling bands like Dream Theater (read: they don't). In DT's case, they've actually had trouble convincing Elektra to let them make new records because they don't go gold in the US right away, even though they do huge business in Europe and Japan.
But you cannot use your benchmarks to claim that P4 is faster than Athlon. Your benchmarks show that P4 with RDRAM is faster than Athlon with DDR DRAM. This does not imply that P4 is faster than Athlon, which is what you were trying to claim.
Actually, I was providing a counter-example to a specific comment of yours:
I can get Athlon-specific optimizations if I use the Portland Group compilers (-tp athlon -Mvect=prefetch), so claiming that "AMD has never had software optimized for their CPUs" is not entirely true. Certainly they're not well developed as the P4 optimizations in Intel's compilers, but they do exist.Also, you cannot make such a claim based on only one benchmark.
I have a hard time viewing the seven different codes in the NAS Parallel Benchmarks as "one benchmark"; the algorithms in the CG (conjugate gradient) benchmark for instance are very different from those in FT (Fourier transform) or IS (integer sort), and they stress different parts of the architecture. They do all favor high memory bandwidth systems, but that's the nature of the beast in HPC -- almost all scientific applications are memory bandwidth hungry.
OK, I was not aware of that, but tell me where I can actually buy them. I looked at Dell's site before I posted. If Dell doesn't have them, nobody else does.
Had you looked a little further at the URL I posted, it had a link to the company we bought both systems from, Ace Computers. Dell is dragging their feet on dual P4s for some reason; we tried to buy a dual P4 system from them, but they wouldn't ship us one until October and couldn't explain the delay. Smaller vendors like Ace were much more helpful.
You pointed to the single (type of) benchmark that makes P4 look good, though not because of P4's virtues. P4 has two channels of RDRAM giving it 1600 * 2 = 3200 MB/s of memory bandwidth. The Athlon machine in the benchmark you pointed to had a single channel of DDR DRAM, giving it 2128MB/s of memory bandwidth. Fluid Dynamic simulation is one of the _few_ things that can really take advantage of greater memory bandwidth. Therefore P4 wins this round by riding on Rambus's coat-tails.
And I suppose the fact that width of the P4's memory bus interface (256 bits wide) is twice as wide as that of the Athlon (128 bits wide) has no influence on the memory bandwidth available? Or that I can't get an optimizing compiler for the Athlon that's comparable to the Intel Fortran compiler?
Prove me wrong. Point me to the same benchmark where both P4 and Athlon have the same type of memory (both support SDRAM now, you know).
Look, I don't have the luxury of caring which processor is better in a "fair" test with the same memory, etc. -- my job is to figure out which system (processor/memory/IO/etc.) is fastest for our users' applications, which happen to look a lot like NPB. We got one each of the "current best of breed" in dual P4s and dual Athlons, ran the same codes on each, and assembled the results. For single-processor stuff, the P4 wins, sometimes by a lot. For I/O bound stuff, the Athlon wins right now, mainly because Intel couldn't design a decent PCI bridge to save their lives. (This is a matter of the sorta-broken [Athlon] beating the completely-broken [P4]; I fully expect a dual P4 board with a ServerWorks chipset to hand the Athlon its butt.)
I'll be the first to admit that my benchmarks comparison page needs work -- I haven't had time to run Gaussian or GAMESS (computational chemistry codes) yet, or BLAST (a bioinformatics code) for that matter. I know of a similar study that had completely different results, I think largely because they couldn't use the Intel compilers due to some language extensions used by their codes (eg. Cray-style pointers) that the Intel compilers don't support.
I was hoping to avoid this pissing match, but these claims are too ridiculous to let pass.
If software was optimized for P4 and Athlon, P4 would still be pathetic.
The NAS Parallel Benchmarks would seem to indicate you are wrong.
A Xeon is based on P3 core. Xeon is just an overpriced P3 with a larger L2 cache. My claim stands: there is no such thing as dual P4. If and when Xeons with P4 core are relased, then we'll talk.
P4 Xeons have been available for several weeks now; we've had a dual P4 machine on site for about a month. Here's the /proc/cpuinfo from it:
processor : 0
vendor_id : GenuineIntel
cpu family : 15
model : 0
model name : Intel(R) Xeon(TM) CPU 1700MHz
stepping : 10
cpu MHz : 1695.874
cache size : 256 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 2
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss tm
bogomips : 3381.65
processor : 1
vendor_id : GenuineIntel
cpu family : 15
model : 0
model name : Intel(R) Xeon(TM) CPU 1700MHz
stepping : 10
cpu MHz : 1695.874
cache size : 256 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 2
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss tm
bogomips : 3381.65
You really ought to check your facts...
OSC's Itanium cluster is physically located at the State of Ohio Computing Center (SOCC), which provides all the power and cooling. The power for the cluster is nineteen 30A/220V circuits, which go to a set of Sentry power controllers. The total heat load is about 195k BTU/hr, but the SOCC building is (over-)engineered such that this much of a heat load did not require additional AC capacity.
OSC had beta Itanium hardware last year as well. The difference is that this machine doesn't require signing a bunch of NDAs to use it.
NCSA has a slightly larger system of (more or less) the same design on order from IBM, and I think the installation is ongoing right now.
--Troy
The person who submitted the story (Troy Baer) is also the admin of the beast.
To give credit where credit is due, the admin of that system is Doug Johnson, who has done an enormous amount of work to get this thing working. I'm just a user support guy who writes lots of documentation and happens to dabble in systems stuff like Maui and PVFS in my Copious Spare Time[tm].
If CSAR had it before June of this year, it was prerelease, NDA-ridden, not-ready-for-prime-time hardware that was not supposed to be available for public use. (OSC has had a few Itanium boxes in house since September as well, but we didn't publicly advertise that fact since they were test systems.) OSC's machine is the first cluster of Itaniums that you don't have to sign a ton of NDAs with Intel to use.
I did read the PDF in that URL, BTW, and I'd love to know how you got SGI and Intel to extend the NDAs to CSAR's entire userbase -- we asked SGI about this at one point, and they just about had kittens.
The Scyld Beowulf software is very nice for quickly setting up small to medium size clusters where users use the whole cluster more-or-less serially. IMHO, it doesn't fare quite so well for production oriented shops like Sandia, where things like accounting and scheduling become important. The Scyld software also has very limited support for Myrinet, which is a very nice (and very fast) interconnect for clusters.
You also need to remember that the Cplant stuff was specifically designed to emulate the user environment of the ASCI Red machine, which inherited its environment from Sandia's Paragon. That was done presumably to keep the retraining of Sandia's user base to a minimum. The Scyld software has no such requirements.
(Disclaimer: One of my coworkers used to work on Cplant, and we've borrowed some of Cplant's ideas [though not any of the software] for the clusters we have at OSC.)
NONE of the major Unix vendors (Sun,HP,Compaq,IBM) have switched ANYTHING to GNOME. Sun and HP have announced they will provide GNOME 2.0 when it is released and STABLE.
<sarcasm> So that's why the Sunrays attached to the loaner Daktari system Sun had in here last week had GNOME 1.2 on them...</sarcasm>
News flash: Sun is shipping GNOME NOW.
Wow, that rant was about the worst thing one could expect from a professional writer.
What happened to eloquence and subtlety?
What happened to NOT drowning out one's own point by using all caps??
What happened to having one's own lawyer review the contents of the release to assure effectiveness??
Haven't read any of his work, have you? Harlan Ellison is about as subtle as the Death Star most of the time, and his only competition for "Most Opinionated Human on the Planet" is George Carlin. Stephen King once described the forward to Harlan's book Strange Wine as making him
*Danse Macabre, 1981.I stopped taking Harlan seriously after seeing three or four his rants on an "entertainment news" show that the SciFi Channel used to have on. Don't get me wrong, he does good work -- the "City on the Edge of Forever" episode of Star Trek was frickin' brilliant, as were some of the ideas he gave JMS for Babylon 5 -- but IMHO he's kind of a loon.
aside from the people who wrote it (and therefore already have a more up-to-date tree than Linus) and are prototyping the silicon, who actually has an IA64 system?
OSC, for one. NCSA, for another. Most of the larger vendors (IBM, HP, Dell, Compaq, SGI, etc.) have a few as well. Plus there's a software simulator from HP that lets you run IA64 programs (including the Linux kernel) on an IA32 box.
They exist, though admittedly only as engineering samples right now. But people do have them.
Maui Scheduler is a open source scheduler for batch systems like OpenPBS and IBM's Loadleveler. It started life as a way to get around the braindead FIFO behavior in the default Loadleveler scheduler.
Anyway, it has a simulation mode where you can feed it jobs (including ones with dependencies) and it will simulate running them. It's a handy way for checking your maui.cfg for pathological cases. :)
AFAIK, CPlant at Sandia is ~1600 Alpha nodes and counting. I think one of the genetic research companies has a 1000+ Intel node cluster (although it's used as a job farm rather than for parallel applications, IIRC).
--Troy
The latest biggest Cray machines have all been Alpha clusters probably running some clusterable DU variant.
I think you mean the T3E, and while it's based on Alphas, it's about as far from a cluster as you can get and still have a parallel machine. It also runs UNICOS/mk (a microkernel version of UNICOS), not Digital/Tru64/name-of-the-week Unix.
See this link. SGI had a cluster of 8 dual Itanium systems with Myrinet on the floor of Supercomputing 2000, last month. I know because my code was one of the ones they were demonstrating on it; they've loaned us (OSC) 4 dual Itanium boxes and Myrinet to do porting and development on.
My guess (given there's almost nothing to go in in the article) is that IBM will be selling the same Itanium workstation chassis that SGI, Dell, and everybody else will be.