Multiprocessor G3/G4 Boards
giminy writes: "These boards from TotalImpact look pretty nifty. Each one can take 4 g3's or 4 g4's and go in a regular PCI slot -- and get this, they can run in Intel machines. They work by having a program dumped to them like a second computer. Still kinda pricey for the cards, but you can put as many of these cards in your server as you want for something super-scalable. Linux support is there, and datasheets are available." We mentioned these back in '98 but a lot has changed since then. I'm sure there are clever uses for a couple of spare CPUs in a box ;)
They're more like "Computing peripherals" - They're PCI cards that fit in any PCI slot. (Well, one conforming to the right specs - One of their 66 MHz/64-bit PCI cards won't work in your average box.)
They cards will not run standalone or as a primary processor, they're slave processors. You still need a host processor, which can be whatever you want. (Intel, SPARC, Alpha, PPC, even StrongARM probably.)
retrorocket.o not found, launch anyway?
If you need to constantly go over the PCI bus for everything (memory, disk, etc) then yes, you'll run out of bandwidth real quick.
However, the board appears to have a lot onboard, meaning that the bandwidth requirements are lower, leaving you with things like a "black box" scenario. You have an image you need manipulated, so you send it to the G4 board with the manipulation instructions. The board gnaws on it for a while without working on the PCI bus, then returns the modified image.
-- Ever notice that fast-burning fuse looks exactly the same as slow-burning fuse? I didn't... (Edgar Montrose)
Ok, this is kinda cool, you can put lots of processor power in one box. Of course you probably will have a bottleneck at the bus so it won't actually be that fast a lot of stuff. The real question is what the hell am I going to run on it?
I mean its mac chips which will most likely go into PCs. No software that's straight off the shelf will run on this thing because its too freaking wierd. Definitely not windows (but so what) and most likely not MacOS either (ditto). However I'm betting you can't just throw Mandrake on this either and get it to work. This company is going to put out a custom linux distro just so they can get some practical use out of the concept.
I mean if you're not going to be using Open Source software with this thing you may very well be up the proverbial creek. Thats not a problem for many slashdotters, but if I want to run a commercial analysis package that available in binary only, this architecture is probably right out.
So far I've gotten all my Karma from telling people they are wrong... :)
What about the larger x86 servers from Unisys and Sequent which have DOZENS of PCI slots?
G4 500's are rated at around a GLFOP. So about 4 GFLOP per PCI slot? Some servers have like 32, 96, 100+ 64-bit 66MHz PCI slots... there's a thought. Heh.
--
PCI beats Ethernet any day, but most scientific clusters use Mirinet or SCI, which are about 1 Gbps full duplex, while PCI is about 4Gbps half-duplex with a 64bit/66MHz bus. This means that as soon as you have two of those babies in your computer, your PCI bus is actually slower than is you had a switched Mirinet or SCI interconnect.
The other problem is the bus on the card is too slow to handle four CPUs. Our experience is that anything over two CPU in a single machines will cause bottlenecks. Except on SGIs with ccNUMA, of course, which can handle eight CPU per machine easily.
Memory is also a bit tight - we usually need use about 512Mb per CPU, this thing as 512 for all 4 CPUs.
Well, that's my NSHO and experience.
Here's Motorola's G4 fact sheet. The real lowdown on the G4 is here. Especially check out the hardware spec. (The link seems to be broken or something, though. I looked at it a few weeks ago :(
:( G4s support a range of clock divisors for the external L2 cache SRAMs, from 1:1 to 4:1. Apple uses 2:1 in their towers. (BTW, the cache RAM is external, but the control logic and stuff is all on chip.)
The TotalImpact page doesn't say what speed they run the L2 cache at. (The PDF spec sheet link is broken
#define X(x,y) x##y
#define X(x,y) x##y
Peter Cordes ; e-mail: X(peter@cordes ,
A fair number of i840 boards have them too, shame you need RIMMs though, but if you can afford one of these boards you can probably afford the RIMMS.
Looks like they are working on a Linux-specific product too...
I used to use a 386 with an array of 16 T800 transputers to render. Each board of 4 transputers had 4 megs of RAM, as well as one meg for each transputer as cache. They communicated along a dedicated back bus.
This was used for RenderMan rendering with the old Digital Arts DGS system. The main processor would split the job into 16 x 16 pixel "buckets" and send the pre-clipped scene data (geometry, lighting, surface information) as well as the a portion of the textures used in the scene. As each transputer finished the contents of it's bucket, it would dump it back along the ISA bus to the Targa framebuffer.
Thats the sort of process these are useful for. Not SMP, but assisted special-purpose processing.
"How perfectly Goddamn delightful it all is, to be sure" Charles Crumb
\subject.
--
Sheesh, evil *and* a jerk. -- Jade
Temper, temper. Calm down. Witness the "Slot A" slot on my athlon. It looks physically identical to a connector for a PII mobo. Do 'ya think they'll work in each, however? No, which is why you read the tech specs..
Temper temper? To you I say Foolish Foolish. I did read the specs (you did not, because obviously you did not read the site before you posted, I did.) It could be pc66, 5v dimms, it doesn't matter, it is a dimm slot, and as you can see (that is if you have even read it at this point, something I am begining to doubt you ever will do) these dimm slots are CLEARLY occupied by cards with memory chips on them. Do you plan to argue that they might not be ram chips in order to justify your original post?
Furthermore, I hardly consider a visit to ONE vendor site a reasonable view of the market. Just recently I have read several reviews of 400W and 450W consumer power supplies. This Board is clearly not designed for consumer use, it is a 66mhz, 64bit pci device, a slot not found on consumer level motherboards, but you would have known this had you visited their page by now, wouldn't you?
This may be considerd a flame, but the root post of this thread is obviously a troll. If you are going to make statements about a product, please inform yourself about the product in question. Don't be the hardware equivelent of one of those foolish people who protests against movies they havn't seen.
NightHawk
Tyranny =Gov. choosing how much power to give the People.
Great use for the new bus. Too bad most traditional PCs don't support either yet. The only machines I know of that have 64/66 is the UltraSPARC-based machines.
Or are there x86 boxes out there that have it?
-- Ever notice that fast-burning fuse looks exactly the same as slow-burning fuse? I didn't... (Edgar Montrose)
So if you can put a bunch of these in a rackmount with a gig or two of RAM, wouldn't it be a cheaper alternative than a Beowulf cluster?
Sound is handled by a sound card,
Graphics is handled by a graphics card and now...
processing is handled by a processing card.
Cool.
Maybe they have a huge cache on the board? Also.. as another poster mentioned.. what are the power requirements? I have a 300W power supply, and 250 is already sucked by *just* the CPU + mobo. I know the G4 has low power requirements.. but can the mobo supply much more than it is now??
No kidding. My G3 gets tanked at least twice a week, and cleaning up after it is becoming a freakin' nuisance. Jose Cuervo and coolant paste makes a horrible reek, and don't even get me started on the effects of black coffee on a PowerBook's keyboard...
-----------------------------------------------
All employees must wash hands before seeking equitable relief.
1) Heavy processing, obviously. Crypto, graphics...
2) Bunch of servers in 1 box.
3) Gaming ^_^
Eh...
The boards run linux. I can't figure out what type of parallel processing they use. On the one hand, they refer to mapping all of the memory on up to 8 boards into a single address space (like SMP), but on the other, the also make a product to use MPI (like Beowulfs) on MacOS.
If they are less than about $2500 for a quad G4 board, this may be even cheaper than the KLAT-2 cluster's $650 / GFLOPS discussed here a while back.
There are a large number of ways to hook multiple CPUs together, many of which have been tried, but only two have been successful: symmetrical shared-memory multiprocessors (SMP), and networked clusters. Many millions of dollars of government money have gone into R&D on nifty ways to hook lots of CPUs together to build a supercomputer, starting with the Illiac IV (1970s), the Connection Machine (1980s), and the BBN Monarch (1990s). None of these led to anything people wanted to buy, even people with big problems and budgets. Vanilla architecture wins again.
http://www.totalimpact.com/G3_MP.html
:-(
Notice that
1) the PCI card is taller than standard height, this limits the number of desktops which can use the card. Hence: PC/*AT* & Older PowerMacs
2) "possible" interface cards... interprets as PMC site available there but software drivers need work.
3) now ask about parallel abstraction layers & tools...
Large parallel systems are quite useful here, but my Total mPOWER boards have (so far) been less useful than the original packing material that the Total mPOWER boards were.
Oh, no doubt--mainframes still make good sense in :-)
a variety of situations. However, reinventing such
a niche item from the ground up seems a pretty
poor idea. (just ask SGI
"People who do stupid things with hazardous materials often die." -- Jim Davidson on alt.folklore.urban
Sure, I guess you'd know the difference between a DIMM slot made for cpu cache vs. main memory. I don't know anything about macintosh hardware, for all I know those dimms were there for caching accesses to main memory.
From the article:
Memory:
Two 168 DIMM sites, support for up to 512Mb of SDRAM, 3.3V, unbuffered PC-100 DIMMs.
As distinct from:
Level 2 Cache:
1Mb of L2 "Backside Cache" per processor.
The board design would be far simpler with the DIMMs as dedicated memory instead of cache, the DIMMs are described as "memory", and the article makes no mention of direct access to system memory; the only reasonable conclusion is that the DIMMs are standalone memory, as the previous poster pointed out to you.
10% of comments - lameass dumb trolls.
20% of comments - can you imagine...a beowolf
cluster of these?
30% of comments - actually, I'm a really really
smart bloke and I know
everything about everything so
moderate this comment up!
35% of comments - karma whorin - come on siggy,
you _know_ you're gonna post
simply to collect yet more
karma. What was it at last time
I checked? 750? I thought so...
5% of comments - I love Microsoft, please flame
me. LOOK! Here's my private
"business correspondance only"
email address, why don't you
hit my corporate email server
with a nasty DDOS just because
I'm obviously a secret MSFT
lover and must be stopped at all
costs.
1% of comments - Really really fscking irritating
statisticians who just _have_ to
tell me that I can't add up...
--
Jon.
http://www.jonmasters.org/
For starters, as others have pointed out, these are slave processors, so by definition, putting this in does not make an SMP box. The S in SMP stands for symmetric, and while the CPU's on the card are symmetric, the card is not symmetric with the main CPU(s).
The way this works is much closer to a mainframe running VM with partitioned systems underneath it. You submit a job by tossing it over the wall to the VM partition (in this case one of these cards) and wait for it to toss the results back. You can probably watch the job some way with a properly written VM subsystem. You probably can't run interactive programs on these cards and if you can, you really wouldn't want to since you would clobber the PCI bus sending keystrokes and screens back and forth. And don't even think about trying to run a GUI on one of these cards.
What these cards are perfect for is batch processing. You write up a queuing mechanism to accept jobs and farm them out to the cards as they become available. The main CPU would manage the UI and the queue. The Cards have their own memory (max 512mb which is not a lot for this type of work) so you can get reasonable performance as long as the data sets are small enough to be loaded into memory on the card.
What this means is that the type of processing you can do with these is limited by the PCI bandwidth and the memory on the card. I don't think this is as great and wonderful as it looks. It's really cool, and if you need to run lots of compute intensive programs with smallish data sets it then this is ideal, but it will choke on high transaction rates and large data environments. Databases are an absolute no-no unless you really hate your PCI bus and want to try and burn it out.
You could tear the ass off some RC5 cracking with this beast and the PPC client.
Also, SETI@Home offers a PPC client that would benefit from this.
I may have to get me a couple of these.
--
http://www.compute-aid.com/atx350w.htmls u/p ?category=power%5Fmanagement&mfg=enlight
350 watt for $55
http://www.overclockers.com.au/techstuff/r_lm400p
400 watt supply
http://www.axiontech.com/cgi-local/manufacture.as
another 400w supply
normaly i think you raise some valid points but
do your homework
They're asking ~$4500 for the 4 G3's and ~$6500 for the 4 G4's. Each board comes with 128MB of RAM. This courtesy of http://www.xlr8yourmac.com
POP is IBM's PPC-based reference platform, which will (we hope!) allow OEMs to build inexpensive and clever PPC-based applications. Design files for the first version of POP never came out due to a bad part (the Northbridge, from Winbond); according to Brad, a "POP2" is on its way.
As always, further info is at http://www.openppc.org.
--Tom Geller
Co-founder, The OpenPPC Project
Tom Geller
And with all those servers in a single box, you'd
be reduced to a single power cord, a single UPS,
a single point of failure...
Yeah, I know that you can (should, would) have
multiple redundant power supplies, but if you're
going to design something utterly centralised,
then to make it replace a reliable server farm
you're going to end up reinventing the mainframe.
"People who do stupid things with hazardous materials often die." -- Jim Davidson on alt.folklore.urban
My first reaction was "Wow, more CPU power!". And then, I actually though what could we do with that beast.
I'm sure those would be useful in niche markets, like imaging/multimedia where special custom software could offload some huge operation to the card while the main CPU deals with the user interface. But that's not my field so I have no idea of the feasability.
Many people mentioned 'Beowulf'! Now, Beowulf is a scientific cluster, and I happen to know a fair bit on the subject, since I work for a research center.
Most scientific applications need lots of CPU power, but also lots of memory bandwidth: for example, simulating the flow of air around an airplane wing what a dataset of 5 GB...
So from the start, the data cache of the CPUs are nearly useless since we cicle through huge amounts of data, the CPU constantly reads and write to memory. The net result is that a standard PC isn't able to keep more than two CPU fed with data before the system bus becomes a bottleneck. Since the mPOWER card has a standard PC bus, only two of the four CPUs would actually be used.
Next, the memory. 512MB isn't actually a lot for scientific clusters. That what you usually have for each CPU. It's a bit tight, but let's live with it.
Finally, the benefit of this kind of card would be to cram a PC box with a number of those, to actually save money by not needing additional hard drives, cases, keyboard, cheap graphic adapter, etc.
The typical PCI bus (64 bit, 66MHz) has a bandwidth of just under 4 Gbps. It is a bus, so only one device can use it at the same time (half-duplex). The usually clustering interconnect (Mirinet or SCI) offers 1 Gbps full-duplex, so let's say 2Gps to compare with the PCI bus.
Let's also say that the host CPU in a multi-mPOWER card situation isn't doing any actual work to let the bus free for the mPOWER.
The means you can put two mPOWER cards in a single system before each card will get lower interconnect than if you had a standard dual-CPU machine with a SCI or mirinet adapter. And that's even before the need to access any disk or network device, which would cause additional traffic on the PCI bus, reducing the overall available bandwidth. That's not much of a win.
Of course, not all application need to have gobles of memory. distributed.net-like application, where the dataset is tiny, could make use of all the 8 cards in one system. I just think that those applications are the minority is scientific computing.
Yikes, it just occurred to me that much of the readership wasn't born at this point, but . . .
.m :)
the bus and processor used to run at the same rate. There were many systems in which the processor plugged into the backplane jsut like any other card. S-100, PDP-11 (and others) behaved this way, as well as other lesser known formats. Others took an approach that was similar: the Apple II exposed everything to the bus, and a processor card could flat-out take over. There were a few hybrid systems that used S-100 for expansion, but had a motherboard with a processor and possibly memory.
Then processors started running fasterthan 4mhz . .
If these were modified to use AGP (or an AGP-like bus) that would give a tremendous advantage. PCI DMA was fast a few years ago, but AGP allows almost direct access to memory windows, which would allow these processors far more bandwidth and system interaction, as well as reducing contention for a narrow PCI bus.
That said, I think this would also be good for distributed.net and SETI, or whatever other data-cruncher you happen to favor.
If you read the post, it says that these cards will work inside a x86 on binaries that contain PPC binary from a cross compiler...
What would seem sweet...and maybe not to hard to do would be to have some thing capable of running 99.9% of the binaries in existence. While we can run many progs under x86 (WINE, vmware), the PPC will allow us to run LinuxPPC-native and even (if you so desire, but maybe not) MacOS binaries. Now we won't have the ROM (maybe the new-world ROM files will solve this), be we WILL have Darwin to work from for something in a more of a WINE like compatability. If the New-world ROM can be used, it may be possible to get something as complex as mol up on your x86 workstation. Imagine having one workstation where you, the HellDesk employee, could run *NIX ( Lin/BSD, natively), vmware (WinXX), and mol (MacOS 9+) from the same workstation... simoultaneously (ignoring the 512M RAM you'd probably need). In environments that have great OS diversity, this would be great (Universities come to mind).
It would be more beneficial to Mac owners to have the reverse for compatability (putting a PIII or K7 on a PCI in your Mac). There are several companies that do this (and probably have patents) such as OrangeMicro which are anally retaining the hardware specs last I heard. And they only develop drivers for MacOS. Plus I think they require special versions of the OS's that run under the hardware anyway.
You also have the possibility to now section off hardware to a virtual environment (similar to IBM's 390's) because you can easily quantize the resources allocated to each environment by PCI card...
- Sig
Tyan has at least one motherboard with 2 64-bit slots on it, but not being familiar with the spec, I don't know what windows can do with it. Nice motherboard though, their newest scsi-on-board mobo.
-Tannin Kal
Yes, but will they release it to the public so Jeff Goldblum will have something to do?
I'm not entirely sure I understand what this would mean. Would this increase the speed of the machine when running everyday apps, or (practically speaking) would this be limited to very specialized programs that like to hog processor (buying a processor for your program rather than the other way around for a change..) ?
PowerPC processors are not well known for their sobriety. Most people willing to add these boards to their servers should seriously think about upgrading their power supplies too, especially if they also use RAID disks or whatever.
BTW, multi-processor, (Strong)ARM-based boards are also being worked upon by companies such as Simtec ; given the average power needs of an ARM processor and the low FPU based needs of a server, this is an interesting alternative (though I am not sure these are out yet).
--
Trolling using another account since 2005.
My goodness. Will you just go read the link to the story, and then come back and admit that you're wrong. And you wonder why people think you're an idiot. Idiot.
Hey, even back in 1988, the good'ole Amiga 2000 had a processor slot. I remember there were 286,386 and 486 cards available and PPC cards as well for the A4000. And it was *very* cool.
Man, running DOS or Windows in a window *without* emulation was über cool.
-- It's always darker before it goes pitch black.
That free PCI card slot in my G3 suddenly became much more valuable. I wonder how many G4 chips I can afford?
I wonder if this can be integrated will with existing chips in a Macintosh computer via the Multiprocessing extension in Mac OS 9. Perhaps Mac OS X will make use of this great resource. Mmm... superserver!