Affordable Supercomputers
Brian writes "CNN Online has a story on a company that has introduced supercomputers for under $100,000 and they hope to have four-layer supercomputers for under $4,000 before long. The computers use AMD processors and according to the company's Web site, they come running Linux. I can't wait to add one of these to my collection!" As always, we've heard about Patmos before. Check out an older story here.
Starbridge systems claim to be alive and kicking but somthing does not seem right on their site in other words they are high on hyperbole but when it comes to specifics like independant benchmark tests for example they are strangely silent.Also two so called "ground breaking" seminars..presentations in florida appeared to have fallen flat....I see vapourware signs everywhere.
Since K6es aren't SMP-able, your chassis count needs to be == to your processor count, so:
100 * 100 == 10,000.
And I'd imagine you want more than 1.5 fiber net cards per proc? More like 4, at a low estimate?
400 * 150 == 60,000.
And if you can get a MB + case + hd for $100, I have a bridge to sell you. At best:
$150 * 100 = 15,000
So, with your admittedly low RAM counts, thats:
10000 +
60000 +
15000
------
85000 just in hardware....
Knock yrself out....
--
Thanks for the URL.
:-)
It's not exactly the same concept since the Transmeta chips aren't gate-level reconfigurable computers, but the dynamic compilation stage seems to have close parallels in both products.
I never did learn though whether Starbridge use layout caching in order not to have to recompile parts of the code already traversed previously. It sounds to me like this nice feature of Crusoe would be equally useful in dynamic RC designs like Starbridge's.
Regarding CPU cores versus FPGA arrays, an FPGA like Xilinx's RC series (6200 onwards) can be regarded as just a core for a microcoded processor because layout control is performed by writing the layout info into a memory-mapped store, which in concept is no different to writing microcode to a conventional microcoded controller. It might be a bit difficult to identify an "instruction set" among all this funky layout data, but hey, when discussing concepts one has to be flexible.
"The question of whether machines can think is no more interesting than [] whether submarines can swim" - Dijkstra
Chassis count was equal to CPU count. the 32 is price of chassis.
--
>400 * 150 == 60,000.
>And if you can get a MB + case + hd for $100, I >have a bridge to sell you. At best:
How about a hub? Gadzoox and 3com have them out.
--
One of the serious problems with massively parallel supercumputers is heat dissipation. I'll leave the rest of the calculation as an exercise for the reader.
----
Stop worrying about the risks of nuclear power and start worrying about the risks of not using nuclear power.
hopefully Linux will benefit from the association with quailty high powered, serious computing applications at affordable licensing prices - while those other guys fill the mass market for wealthy customers who need a superwastey bloto-proactive 3D vr wiz to help them figure out 'double clicking'
The Scarlet Pimpernel
try { do() || do_not(); } catch (JediException err) { yoda(err); }
I haven't heard anything. To tell the truth, I suspected fraud... that it was a bogus claim. Now.. it would be nice if it wasn't.. but they didn't give *any* technical specs... just made some bold claims.
The url is www.starbridgesystems.com and they appear to be alive, well, and selling systems.
Though I wouldn't call this the same concept as transmeta. FPGAs are fully programmable, and have no reall core; it all lies outside of the FPGA.
Lowmag.net
200 Mhz K6-2's arnt in the market anymoe.. a query on pricewatch for K6-2 300mhz's turned up chips for 27 dollars.. He had 11 processors.. that's well under 300 dollars
Much to the dismay of those tearing apart the system based on this statement, the BUS is 200mhz, and last I checked the Athlon ran at that speed (or rather 100mhz with upside/downside clocking.) Now, if you look at the statement on the site where they mention a 200mhz bus, the only chipset used on this system could be an athlon.
Lowmag.net
-----------
"You can't shake the Devil's hand and say you're only kidding."
The real benefits are going to be for small businesses, small universities, high schools and other organizations such as these that will begin to see the need more more processing power that an individual PC will not be able to handle. Supporting a large user base, applications run over a network (the trend back towards consoles hooked to a mainframe), and such. These are where this type of machine will really be useful.
The organizations that purchase these cheap supercomputers will be looking especially at the tradeoffs of getting one supercomputer with 50 console stations vs. 50 high-end PCs.
It's a fantastic leap in providing computing power to everyone, in my honest opinion.
----
Lyell E. Haynes
+1 Insightful, -1 Troll. What can I say, I'm an Insightful Troll.
On the other hand,
If you put 100 average computer cpu's to work together in a beowulf cluster,
it would be a 100 times as fast as your average computer - thus matching
your definition of the term "Super computer".
Please take a look at the Top-500 list of Supercomputers (forgot the link,
use a search engine)...
The real news in this story is the target price of $100k.
Linux/Alpha based supercomputers are old news and have been aroung quite some time now...
A Top-500 Supercomputer for only a few thousand US$ would definetly shake
the grounds of this market.
I want to see what the big players (IBM, Sun, NEC, Fujitsu) have to comment
[if you note the absence of SGI in the above list, it was intentional.]
--
Jor
200 Mhz K6-2's never WERE on the market. The K6-2 started with 300. The K6 was available from 166-300, but was replaced fairly quickly by the K6-2 at 300, 333, and 350.
I think the 200 Mhz K6-2 must be a typo. I think these are actually 450 Mhz K6-2's.
--- "So THAT's what an invisible barrier looks like!" - Time Bandits
K6-2/450's are 100 Mhz bus. Other than that, your post sounds accurate.
--- "So THAT's what an invisible barrier looks like!" - Time Bandits
A Few points:
- CNN was obviously erroneous about the Patmos using AMD K6's instead of AMD Athalons.
-- 200MHZ bus and the words cheap obviously seemed like a K6 to the semi-technically knowledgable author. In fact, ( as has been pointed out here several times ), the only 200MHZ bus available for a PC is the Athalon.
-- They mentioned Patmos reaching 1GHZ w/in the year. The K6 does not have the potential to make this speed, in fact, few processors in the world other than the Athalon are currently capable of reaching this goal within a year. Especially without requiring a motherboard; which obviously would be bad for Patmos, since I assume they provide _some_ form of custimization of their motherboard. Or at the very least have carefully selected a board, and would not think kindly of choosing a new board so soon after their initial product release.
- The unqualified use of the word super-computer.
I've noticed several posts about people thinking that they could design a "super computer" even cheaper than Patmos. But really, all they could achieve is a "theoretically fast" machine. A supercomputer is the sum of all it's parts, and therefore the weakest link can break the chain.
As a disclaimer, I am not formally trained in super-computer concepts, but much of this is based on my experience and common sence ( which may differ from horse to horse ).
A super-computer must have top tier performance ( obviously ), must have data-integrity ( you don't spend half a mil just to have a core dump, or system freeze ), reliability ( 24 / 7 uptime while performing it's work ). It should also be scalable ( grows with your company or the task as is fiscally justifiable ).
--Simple points: When selling a super computer, you must choose high quality parts ( or at least make things n-way redundant ).
In my experience, IDE drives don't cut the mustard, due to their high volume, minimal quality price-focused nature. ( skimp on a heat-sink or shock obsorber to save 5 bucks per drive, etc ). When you buy IDE, you think disk space and low price. When you buy SCSI you think of performance and quality ( and usually expense ). Thus they are designed based on that marketing paradigm.
An IDE drive also uses an IDE controller, and is thus inherently sequential. A SCSI device can queue multiple independant requests, so as to perform disk geometry to determine an optimal seek path. Additionally, due to the paradigm above, more cache and higher rotatial speeds are applied to SCSI devices ( not that they couldn't be applied to IDE.. but why should they? )
As for a network, some referred to HUBs and ethernet. Ether does _not_ scale well. Sure you can get a faster / more intelligent HUB, but you never achieve maximal theoretical bandwidth. I'm not completely sure of the network technology used here, but it seems to be peer to peer and bi-direction ( to facility rapid acq's ).
--Memory. This is really the key to a good super-computer design. SGI made use of wide 256 bit multi-ported memory busses with interleaved memory ( 16 consecutive bytes was segmented across 4 memory controllers, thus linear reads were faster, AND independent concurrent accesses had a statistical speed-up ). Of course a 256 memory BUS is expensive, especially in a multi-CPU configuration. SUN's starfire, for example, has up to 64 fully interconnected CPU's ( don't recall the BUS width ). This required a humongous backplane with added cost.
AFAIK, the Athalon uses regular SDRAM ( and a cost effective solution would have made use of off-the-shelf parts ). SDRAM is nicer than older PC-based memory in that it's pipelining allowed multiple concurrent memory-row access within a chip. Several memory addresses within the same row could be in the pipeline, and up to 2 rows could have elements in the pipeline. This is a more sophisticated approach than interleaved memory, BUT, it introduces a longer / slower pipeline. RAMBUS furthers this concept by narrowing the BUS width and furthering segmentation. It allows greater concurrency, but latency ( and thus linear logical dependency ) is increased.
RAMBUS's theory of high latency, high concurrency benifits non-linear programming, such as Italium's ( Intel's Merced ) deep speculative memory prefetching, or ALPHA and SUN's multi-threaded processors ( where cache misses cause an internal thread-level rapid context switch, thus hiding the latency ). Existing x86 architectures, however can not fully take advantage of such concurrency, and the net effect is slower execution time for linearly dependent algorithms ( non-local/consistent branching, and non-parallelizable math calculations ). In this case, making use of high speed / low latency interleaved EDO ( as is / was done in several graphics boards ) seems a better alternative ( but hasn't come to pass in mainstream motherboards ).
--mutl-CPU. This is an interesting topic. Mutiple CPU's can connect to the same memory ( with large internal caches ), or can have a numa architecure with shared segmented memory ( isolated, with interconnecting buses ). Or they can be autonimous units connected via a network. There are pros and cons to each mechanism. The last requires the most redundant hardware ( which is actually good in terms of hot-plugability ), and has the slowest inter-CPU communication. It thus works well in message passing systems, as opposed to symmetric decomposition of large data arrays ( eg parallel vector processors ). Personally, I like the NUMA approach the best, but it requires proprietary hardware, and hot-plugability would have to take the form of a VME bus etc.
It would seem that the approach here is multi-CPU ( 2 or 4 ) to perform a single task. Concurrent threads are distributed across machines in a message passing system ( hopefully minimal data-sharing ). The AI controller probably handles messaging, arbitration, in addition to the advertised load balancing. The multiple CPU's on a board are most likely for redundancy. My guess is that 2 or 3 CPU's are used for user-thread redundancy and a 2 or 1 CPU's are dedicated for OS operations ( using spin loops in the user threads ). Thus minimal context switches are required, minimal memory bandwidth is used ( since only one virtual CPU is ever accessing memory at a time ( though 2 CPU's are simultaneously requesting that information ). They may actually allow the Linux scheduler to rotate proc's, but as I've learned, this isn't Linux's strong point. A single tasked CPU is a happy CPU.
I know SUN has optimizations for context switching ( keeping most pertinent info w/in the CPU, along with a unified virtual memory model, as opposed to the offset-based x86 virtual memory model ). Unfortunately this is offset by register window swapping, but such context-switching centric processors would allow for more efficient concurrent operations such as multi-threaded apps ( such as java. Before you laugh, one application of this low-cost supercomputer is web servers.. And serverlets are an emerging technology, people will ask the question, how can I make this existing code run faster in a short period of time ).
-Concept. ASICs / FPGA's. SGI had the concept of making a simple, cheap, reliable, and fast logic CPU, then couple it with extremely proprietary logic / processing chips that offset the code logic. Combinational logic is faster than any sequential logic, though much more prone to bugs, and higher production costs. High performance reprogrammable FPGA's could help the industry, since the hardware could maintain a high volume, low cost ( as with current CPU's ). Thus you could make PCI / AGP expansion boards that handle load-balancing, message passing, java-extensions, OS-operations etc.
I'm sure their AI logic is done similarly, but it's a completely seperate box, my thought would be that the "boxen" would have these expansion boards, and the customer could request optimizers for say, the web, or weather calculation, chess designs, what-have-you. The goal being that these expansion boards become as common as modern graphics accelerators, modems, sound cards etc, without having to go through all the trouble of designing the hardware of those boards.
-Michael
-Michael
They currently use AMD K6-2's in their machines, not Athlons. But they say the 1-Gig Athlon-based machines are coming soon.
When I'm singing a ballad and a pair of underwear lands on my head, I hate that. It really kills the mood.
-Tom Jones
The list is at http://www.netlib.org/benchmark/top500. html
The illegal we do immediately. The unconstitutional takes a little longer.
--Henry Kissinger
I too am very skeptical of this. From what I can tell of their web page, they don't scale very far since they only have 8 compute nodes. Even assuming 4 processors per node (and I think they only have 1) that would only be 32 processors. Granted, for 99.9% of the users out there, that is a whole boatload of processors. However, our large systems go to 512 processors (1024? Maybe in not too long). The Cray T3E goes up to 2048p. Even our clustering product generally has more than that. The cluster we recently installed at Ohio Supercomputing Center was 256 compute nodes, I believe. I just don't see how 8 to 32 processors is going to compete with that. Now their reliability looks pretty solid, though.
Another problem, in general, with x86 "supercomputing" is that a lot of scientific code out there likes 64 bit math. Merced^H^H^H^H^H^HItanium, MIPS, Alpha, and the Cray Vector processors have a nice lead there.
Lastly, someone previous in this thread said something to the effect "wouldn't it be great to make the top 500 for In short, I don't consider this to be a supercomputer. An HA cluster, maybe. But it's hard to tell since their site is pretty sparse on technical details. I am *very* suspicious of a "supercomputer" company that doesn't post benchmarks. One of Seymore Cray's rules of supercomputing is that your machine should be the fastest in the world at *something*. Lastly, they need to learn how to put in "'s instead of ?'s. Their HTML is inept at best...
Go Badgers! -- #include "std/disclaimer.h"
I speak for myself, not for SGI.
Go Badgers! -- #include "std/disclaimer.h"
// code runs normally
...
...
// this code compiled to FPGA
...
They were quoting 20x speed increases compared to a standard pentium. The downside is that initially you had to know in advance which bits of code required the speed increase.#define OPTIMIZE
#undef OPTIMIZE
Patmos's Limbix software, based on the Linux operating system, monitors and manages workloads using neural networking and fuzzy logic, two artificial intelligence methods.
This would be really fun to have at the house. More power! Argh Argh Argh!
Never knock on Death's door:
More race stuff in one place,
than any one place on the net.
You're Missing one crucial point on this. If you read the story carefully, it says "Fibre channel bus" not "gigabit eithernet" Fibre channel is used in SAN's (it's just like SCSI, except the hard drives can be a mile apart and it runs at 1 gigabit, not 160 megabit)(Makes ultra66 look like MFM), and if you look on pricewatch, a fibre channel will run you from $350 to $5000 EACH COMPUTER. This is NOT include the cost of the cabling, and the cost of the Fibre channel hard drives. If you look for prices on Fibre channel HARD DRIVES, a single 18 gig 10,000 rpm will run you a hair over $1000. PLUS you have the cost of the fuzzy logic development. Have you ever heard of a supercomputer that can automatically detect a failing node and reroute the traffic and load all by itself seamlessly? Seems pretty cheap when you look at it this way.
It would be if you got something for the money! Come On. 11 K6-2/200's for $99,000. Get Real. I can almost overclock an Athlon that fast! Those chips can't cost over $25.00 each on a $70.00 motherboard! I can easily build a 16 Processor 800MHz Athlon system with 8GB of ECC RAM, and 500+GB of process storage, and a 1.2GB/S SAN to connect it all together. That is the system I can build for $99,000. It runs the same parallel code that his would run, I get a theoretical top of 12.8 GFlops though, Much more if the programs are optimized for the Athlon.
1 2 3 testing testing
is it working????
(i'm tryin ta test around with it)
....
Thanx! (if it worked)
What they describe as being "200MHz" is the bus speed, and that is a fairly different matter. If you look at those AMD K6 chips, they're connecting to motherboards that have bus speeds of (in these inflated days!) either 66MHz or 100MHz. That's rather less than 200MHz.
The bus technology getting billed as a "200MHz thing" is the Alpha EV7, which suggests that the CPUs in these systems are either:
- Compaq Alpha, or
- AMD Athlon.
I'd sort of anticipate the latter, but it is surprising that they are not trumpeting their use of whichever CPU they are using.The paucity of solid technical information and the proliferation of TM-this and TM-that is a bit distressing.
If you're not part of the solution, you're part of the precipitate.
AMD processors with SMp Linux, what a joke. Can you say PowerPC 7400 G4's with OS 10 or another UNIX variant.
If the version of OS X Server I saw last spring is any indication, OS 10 is a total non-competitor. It had serious problems even compiling fairly generic ANSI C code (lmbench, MPICH).
And the G4 is not all it's cracked up to be. There's not enough memory bandwidth on the PC100 bus to sustain anything close to the FP rates Motorola and Apple like to point at. There are also no vectorizing compilers for the PPC 7400; the Metrowerks compiler will do inline AltaVec assembler, but it doesn't recognize vectoizable loops autmoatically and it doesn't support the linga franca of scientific computing (i.e. Fortran).
"My life's work has been to prompt others... and be forgotten." --Cyrano de Bergerac
Several months ago Slashdot featured a supercomputer-on-a-desktop that used on-the-fly reprogrammable FPGAs (Xilinx chips almost certainly) to gain massive speedup over conventional microprocessors. It featured a dynamic pre-compilation stage that fulfils a function very similar to that of the Code Morphing Software in the Transmeta products. (This general area is called Reconfigurable Computing, RC.)
Has anyone heard any more about the company that was manufacturing the supercomputers? I seem to have lost the URL.
"The question of whether machines can think is no more interesting than [] whether submarines can swim" - Dijkstra
I think linux is going to have to make sure that all its scalability issues are sorted out before this really kicks off. As the Mindcraft survey showed (yes, I know the tests were flawed, but the underlying problems are still there), linux has some scalability problems. Admittedly, these are being fixed, but until such time as they are fixed, Solaris, IRIX and even *BSD are going to be better options for large scale "supercomputers".
--
200 Mhz K6-2's arnt in the market anymoe.. a query on pricewatch for K6-2 300mhz's turned up chips for 27 dollars.. He had 11 processors.. that's well under 300 dollars..
I will use 100 processors,
--- which would be 2700 bucks.
Chasis -- 32 * 100 -- 3200
Fiber net cards == 150 * 100 - 15000
RAM - 64 (say 64 mb is 64 bucks) * 100 - 6400
HD/mb& misc - (need not be fast and need not be that much) 100 * 100 = 10000
Total == USD $37300 (37% of what he quoted). And this is with 100 300mhz k6-2's instead of 11 200mhz k6-2's.
All prices quoted from pricewatch's listings (specially CPU & NIC).
--
A (true) cautionary tale: company A develops some software for company B. Company B provides several high-end big money machines (multiple pentiums, hot-swappable SCSI raid array, rack mounted, etc).
The development process (which goes through a couple phases) takes more than two years. When the project is done, Company B will probably abandon the servers because (relative to what's available today) the machines are no longer worth the shipping costs it would take to come and get them.
The moral of the story is a corollary to Moore's law: the power of today's high-end super computer will very soon be mached by tomorrow's mid-range workstation (and then low-end home system, and then embedded chips...)
My Palm VII has more RAM in it than the main frame machine I wrote code for as an undergrad < mumble> years ago...
150 Opening BINARY mode data connection for slashdot.sig (129323052 bytes).
The NBoxen in a Perpetua are inter-connected and are part of a network that has a minimum of three layers consisting of an input layer (explicate), a middle layer (implicate) and an output layer (explicate). The middle layer (implicate) is hidden and cannot be directly observed from outside. The input and output nodes (Limbix) shape the specific problem.
The cool phenomena of a PATMOS Perpetua is that its remarkable neural network operates in the hidden or "implicate" NBoxens that are the mathematical forum in which the system inter-relates the "explicate" input and output signals. This is the forum in which the decisive calculations operate by which complex, non-linear relationships are learned and the solutions evolve.
The "implicate" layer is able to deal with the very "noisy" data as it searches for patterns hidden in the noise. Because of the configuration of Perpetua and the presence of Limbix controllers, the system can deal with both the data which responds to hard laws and the end data which is not inimitable to hard laws because the underlying process is unknown.
It is my veiw that there would be no golden world of linux without Richard Stallman(RMS). I and my company support the Free Software Foundation, and belive that RMS should be nominated for the Nobel Prize. To tread on the principles laid down by RMS would in my view be anti-social behavior. In the near future we will be posting our source code under the GPL.
When we pushed the button on the spreadsheet and realized that we could sell super computers for the prices that came up, we had to rush to the side of our Chief Financial Officer, a Harvard MBA, and hold him up.
While we appreciate the exuberance of the press we must tell you they occasionaly make critical errors. One of which goes to the speed of the CPUs of our device. Indeed our choice is Athlon. For those who question the scalibility of linux, we agree with you. Take a look at DSI at Indiana University. While we would like to go into greater detail in this matter, we must abide by certain biological principles regarding gestation. We have a lot more ideas about a co-processer option, which may allow us all to have our cake, and eat it too. Finally, certain statements in the press bring a smile to my face, "Super Computers for the Rest of Us" is one such statement.
Perpetua is a super computer that can act as a high availability server, and they do go past 8 nodes. In fact we can make a Perpetua of any amount of nodes you want. I wait for sledgehammer with baited breath
Sincerely,
James A. Gatzka
CEO Patmos International Corp.
(Disclaimer: I work for Ohio Supercomputer Center but don't speak for them, yada yada yada...)
This seems to be aimed more at the high availability (HA) market than the high performance computing (HPC) market. Comparing with a Compaq Himilaya is *not* a way to win points with HPC centers, because HPC centers don't buy Himilayas -- they buy mostly various breeds of Crays, SGI Origins, and IBM SPs, with a smattering of Beowulf clusters and large Sun configurations as well. The Patmos site also doesn't talk about floating point performance, which the HPC centers consider critical.
The Patmos site never really describes their systems as "supercomputers" (although the phrase "super system is used once or twice), so this seems like bad reporting and/or a misunderstanding of what a supercomputer really is on CNN's part.
(In case you're wondering what I consider a supercomputer, I personally think a super is anything capable of multiple GFLOPS that is used for scientific computations.)
"My life's work has been to prompt others... and be forgotten." --Cyrano de Bergerac
A "supercomputer", by a more professional definition is a computer that runs at least 100 times faster than your "average" computer. Often, its more like 1000. Mainstream news agencies like CNN, CNET, etc. seem to like to use this word for just about anything more than 8 processors. SGI doesn't even consider its Origin line to be a supercomputer until it passes at least 32 processors.
As far as this product goes, I think it's got a good place in the dedicated server business, and possibly some low end batch computation. I do have to admit, using AI concepts for system monitoring was a pretty neat trick!