10-TFlop Computer Built from Standard PC Parts
OrangeTide writes "Using PCI host adapters and Xeon processors, engineers at Lawrence Livermore National Labs have achieved 10-TFlops relatively cheaply. More information can be obtained from this article at EETimes." Lately, Linux seems to be the operating system of choice for new supercomputers, and this one's no different. It's cool to see big iron made cheaply.
A commodity supercomputing cluster of these! (There has to be a better name for it, but I'm new here on Slashdot).
... which was specifically developed for running Doom III.
Like Teddy with an elephant gun.
As long as it doesn't see a full moon, then it won't turn into...
Beowulf!
zzziiiippp!!!!(I love the smell of nomex in the morning!)
>The 1- to 10-teraflops processing range is opening up a revolutionary capability for scientific applications
In the not too distant future, that kind of processing power could very well be available in home PCs. Imagine what that would do to...well, I mean, dang it, what the heck will we do? Game frame rates can only go so high. Even realism of 3D graphics may have it's limits. Oh sure, we'll find something, but it's difficult for us to imagine now...
---
Open Source Shirts
Were I work (I can't say where... I've signed papers...) we have replaced an SGI 'super computer' (or mini computer, whatever, a big number crunching beast of silicon!) with a Beowulf cluster. This not only gives us great scaleability, but also lots of FLOPS per dollar (or rather, krona :^P).
Then the world will finally see the 4000 Playstation 2's that Saddam used to build a supercomputer
Look, all the 'cool' people are doing Linux, right? But BIG IRON is clearly trying to suck you and your money in. Oh, it's so cheap, but they don't tell you about the hidden costs, do they?
Just say no to BIG IRON!
[o]_O
can be found here.
So the Teraflops they're mentioning are just a theoretical upper bound, don't get too aroused when you see it.
The Raven.
The Raven
The important part isn't the number of FLOPS (to get those you can just keep buying more PCs until you reach the desired number) but the performance in applications which are not 'embarassingly parallel'. In other words how good is the interconnect between machines? The article talks about a new network to replace Gigabit Ethernet.
-- Ed Avis ed@membled.com
The system has a few unique features that the lab says will facilitate applications performance, including a fast, custom-made network that taps into an enterprisewide file system.
"This network approach is nice because we can use a standard PCI slot on each processor node, which gives a 4.5-microsecond latency," he said, as opposed to 90-s latency for Gigabit Ethernet."
The boards are linked by a network assembled by Linux Networx into a clustered system that will have 960 server nodes.
The file system, called Lustre, uses a client/server model. Large, fast RAM-based memory systems support a metadata center, and data is represented across the enterprise in the form of object-storage targets. "Being able to share data across the enterprise is an exciting new capability
I think this is especially interesting, because it seems to glue together pieces from traditional clustering and distribted or metacomputing. Is there some site for this project with more details?
Extraordinary Vacations. Exceptional Prices
So please explain this. I mean, I have two linux boxes in my room and each has a free PCI slot. What do I need to to to network them over directly over PCI?
I have a lot of movies to convert to DIVX...
You know, it wouldn't be stupid of Apple to try to build in some code for arbitrarily large clusters into Darwin. It would really be a prestige coup if a mac cluster became a top-500 computer.
I told you my other Boxen was a 1000 node beowulf cluster... But no one believed my sticker...
Sigs? We don't need no stinking sigs!
"There is no reason anyone would want a computer in their home." - Ken Olson, President, chairman and founder of Digital Equipment Corporation, 1977.
Anyways, what I'm trying to point out is that it is actually becoming very convinient to build a super computer with lots of PCs that just lie idle. I am not sure if Saddam has heard about cheap linux systems. But what if he could build a super computer cluster?
Boy this gets interesting and scarier at the same time.
Uuh, I mean null-card connection. I have never really looked at the PCI spec from an eletrical engineer standpoint, but there are probably power leads, data leads, timing leads, and ground leads on there.
The data leads should be easy...TX to RX. Although they may use a full-duplex lead where the data shares the bus based on clock pulses.
The power could be dropped, as both machines already have the proper power requirements. The ground leads could be tied together if you wanted, but dropping them shouldn't have too much impact on the final outcome.
The tricky part would be the clock pulses. In order to keep the data integrity, you need to have both bachines on the same clock. The easy way would be to take the crystal from one motherboard and wire it to the other. Same crystal, same clock pulse.
Then drivers would be needed to make the other computer look like an attached device. Shouldn't be too difficult. Just take a NIC driver and modify it...heavily.
I think an easier option would be to share data across the IDE bus. Make an IDE driver look like a NIC driver and send IP across IDE. In fact, I remember Linux Journal publishing an article about someone doing IP over SCSI about 2 years ago. Get some SCSI cards and make your own version of a CDDI network ring.
I'd rather you do it wrong, than for me to have to do it at all.
"I was doing my nuclear simulations on the ASCII White and it was like BEEP BEEP BEEP...and like half my work was gone..."
"I'm tired of all this 'Aren't humanity great' bullshit. We're a virus with shoes" - Bill Hicks
If you check current prices, the Xeon isn't much more expensive than the AthlonMP. Pricewatch has the 2.2Ghz xeon at $245 and the athlonMP 2200+ at $204. Each of these machines is interconnected with a Quadrics board that probably costs more than $2000, so an extra $80 for CPUs isn't much.
Why not use AMD anyway? There are xeon motherboards with chipsets like the Intel E7500 and ServerWorks GC-HE that have greater memory bandwidth and PCI bandwidth than the AMD 760MPX. For many problems in scientific computing, memory bandwidth is what is important, not CPU speed.
Anyone have any experience using (Open)MOSIX? I have a partially CPU-bound application (automatic part is IO-bound, manual part is CPU-bound) in Perl, Apache and MySQL. Anyone got experience with this stuff?
:)
For those who don't even know what MOSIX is, it is a kernel patch that essentially creates a virtual computer out of several boxes. They claim they will scale your application as long as you have multiple processes (they migrate them as needed) - without any coding on your part.
Since I'm looking for extra performance with limited resources, this looks like a potentially easy way out
Stop the brainwash
The title says it all. Big Iron is _engineered_. No matter how big or how spiffy a Beowulf cluster is, it's still just a bunch of PC motherboards kludged together with a bunch of network cards. There is a reason Crays are expensive - they are _worth it_ from a performance standpoint, because not every problem lends itself easily to the solution of a Beowulf cluster. Some problems require the exchange of a lot of data between a lot of nodes, and a little math will show that it won't take much data interchange to saturate even a GigE switch. Adding more machines is not going to help; craftily designing and overengineering the network _might_, but by the time you get this whole damned thing glued together well enough to approximate a Cray's performance, you'll have spent enough to have just flat-out bought a Cray in the first place.
As others have noted, while this thing may have a theoretical peak performance of 10 TFLOPS, I'm willing to bet that number goes down like Monica Lewinsky on Quaaludes when you feed this magical supercomputer a problem that's _not_ suitable for distributed.net (i.e. one where computations on one node are dependent on computations on another node, like fluid-dynamics problems, turbulence, etc.)
Yeah, it's interesting as a curiosity, but this is by no means spectacular. Beowulf is good for what it's good for, which is a "poor-man's supercomputer" that works well for coarsely-parallel problems that don't require a lot of internode communication. It's not the Philosopher's Stone, folks.
-SD
I am Chaos. I am alive, and I tell you that you are Free. -Eris
The distributed memory Crays (T3D, T3E) are just the same: boards and network cards. The processors they use are not faster than the last generation PC processors. The difference are the NICs, that have about 10 times more bandwith and 10 times less latency (compared with standard fast ethernet cards).
There is the difference. As you say, for certain problems, this means that the whole machine is about 10 times faster than a Beowulf.
However, if/when conventional NICs are fast enough, specially in terms of latency, both systems can be equivalent again. In the meantime, a lot of people are trying to develop parallel algorithms that minimize the number and size of the messages, allowing to use cheap PCs as supercomputers.
Answer: Depends on his intent. If he is using it for finding extra terrestrial life, by all means he can go ahead, but if he is using it to test one of his biological weapons then he is obviously bad.
What if he finds some ETs who can help him out with some guy, known as GW Bush, who wants to invade his country.
2/3rds the cost of the three year computer lifetime is the electricity and cooling system. When TOC is counted a transmeta based cluster or the super-dese SGI cluster announced yesterday is cheaper.
"We have been using the File Transfer Protocol over Gigabit Ethernet, but now we will be able to read files directly from any available disk."
translation
We used to use FTP over Gig-E but came up with something more L337.
Trolling is a art,
- Depends on the Xeons they are using. The 'old' Xeons are around the same cost as their AMD counterparts. The 'new' Xeons have large L3 caches (1M and 2M).
- The AMD SMP chipset is slow (memory bandwidth) compared to the newer Intel chipsets.
- IIRC, the P4s use less power than the Athlons, probably this is not as important but it is there.
I'd like to see a comparison of a newer dual Xeon machine vs. a good dual AMD to see the performance difference. I would suspect that the dual Xeon machine would be a bit faster.
a race between engineers trying to make faster and better computers and Microsoft trying to make more bloated and processor-heavy operating systems. So far, Microsoft is winning.
Technoli
One of the benefits of computers is the ability to solve a problem with iteration rather than trying to come up with a classic "equation" and solve it. When I first entered the job market I had a trusty Pickett N4ES slide rule (and an N600-ES pocket slide rule) and had to first explain a problem with an equation and then solve the equation (from the "inside out" which was why HP calculators with RPG were so popular with engineers when they first came out versus the TI models... but I digress).
With the introduction of the HP-35 calculator (the "electronic slide rule") we could solve problems by just crunching the numbers at our desks. With the availability of programmable calculators (HP-67/97 and HP-41 - both of which I still use... but then I still use the slide rules too) we could program them to iterate through problems.
Not as elegant, certainly. But lots more efficient. And I'm sure that most of us have lost some of our old abilities to "see" problems in math... and perhaps some students never really learn that. But the jobs still get done and the tools still keep making it easier. I'm thinking about a Beowulf cluster for our office, actually.
No one ever had to evacuate a city because the solar panels broke!
The interesting thing about this setup is that it doesn't work like the traditional supercomputer. It's more like a community of totally independant computers all willing to work on the same problem.
The system employs a whole lotta control nodes that spend their whole time trying to assign work out to the worker nodes. The problem then becomes not just parallelizing the work but coordinating the workers. Apparently with this cluster design, it's not all as cut-and-dried as with a "real" supercomputer. They have been able to do some really cool stuff, though. Like, for example, any computer in the cluster can address the memory on any other computer.
The admins I talked to said they weren't really sure just how fast the system could go, because they could never get it to operate at full capacity. They said the fastest they'd gotten it to go was 4T-Flops, but they figured they were only at %40 theoretical capacity.
"With sufficient thrust, pigs fly just fine. However, this is not necessarily a good idea...."
RFC 1925