SGI And /Massive/ Linux Machine
Hi all,
Just thought I would send out a note outlining the state of the mips64 port. Ralf, Ulf and I have been actively working past few months to bring up Linux on the SGI ccNUMA machines.
The executive summary: we have achieved multiuser boot on o200 and o2000s. The largest configuration is a 32p, 16node machine (only approx 4G worth of memory was populated over the 16 nodes, the system can take 4G * 16 node worth of memory). This machine has 10 PCI busses, with 24 scsi controllers and 10 disks. (Sample output is at
If you are interested in the system architecture and details of the port, read on. The o2000s use R10000 series of MIPS processors. Each machine is comprised of modules, each module has 4 node boards with max 2 cpus and 4G memory on each node, and IO boards and routers. In a module, the two alternate node boards are each connected to a XBOW. Each XBOW possibly is connected on the other side to a number of PCI busses, which is what the IO boards connect to. Apart from this, there are routers in the system that provide connection paths between all memory to all cpus, to create a true CC-NUMA architecture.
On the software side, we are still struggling with compiler and binutils issues. The kernel itself is 64 bits, created by cross compiling on an ia32 box. We have not attempted 64 bit user program compilation or execution. The root disk is currently very close to the MIPS/Indy root disks. The architecture specific code uses the CONFIG_DISCONTIGMEM code to support memory on all nodes. The architecture specific NUMA features currently are: 1. replicate the kernel text on all nodes, so that no one node becomes a memory hot spot (unfortunately, the kernel data has to reside on only one node). 2. replicate low level excpetion handler code on all nodes. The architecture code also turns on CONFIG_NUMA to take advantage of node-local page allocations. (A CONFIG_NUMA patch that I have been submitting to Linus was put into the kernel in test6-pre1). For more information on NUMA and ongoing work, refer to
The purpose of doing this port is to boot Linux on bigger systems that we have, in order to do cpu/memory scalability studies. This also lets us do NUMA performance work in the future. Another advantage is to be able to leverage this work on the upcoming SGI CC-NUMA Itanium boxes, which will be an SGI supported product. Initial results from scalability studies using mips64 is documented at
The OSS SGI site.
Kanoj
Huh? This system can do genetic pattern matching, but it's far less cost effective than a pile of small machines. Fortunately, the people who actually spend millions of dollars on machines to solve problems like gene matching investigate the problem more carefully than your friend.
Two companies doing this problem are Celera Genomics and Incyte. Incyte has a cluster of 1,200 x86 machines (3,000 cpus) running Linux. Celera Genomics has a cluster of 1000 Alpha cpus in 250 nodes; Celera purchased their machines before it had been shown that Linux could handle that kind of task.
And a company that specializes in getting fast storage for the movie industry is MountainGate.
I'm not so sure that even the rendering example is really valid. Much rendering treats rendering as an embarrassingly parallel problem: invidual frames slow, entire movie fast. That's much more cost-effective.
Pretty poor troll there, friend. Linux != x86. How many times must I tell you this? Linux is a nice OS. Not perfect, just nice. x86 is a shitty architecture. Not merely bad, shitty. There are many things that peecees cannot do and will never do. There are a few things that Linux cannot do; perhaps it will do them in the future. *thwap*clue stick*
Unless you're talking about this.
I don't need large brains to have a good time.
If you look at the linked info you will see that:
a) There are in fact 14 scsi devices attached. (13 drives and a cdrom).
b) Even so only 4 of the 24 scsi hosts are actually used (So 20 scsi hosts are being 'wasted', not 10).
Your initial question ('There isnt anything special about 10 drives, so why have 24 scsi buses?') was backwards. They are developing on a big-arse piece of machinery here. The point here isn't making efficient use of 14 scsi devices, it's showing that Linux can run and access 24 scsi buses. Your question should probably have been 'If they want to really show that you can use 24 scsi hosts shouldn't they have a shitload more drives'. Quite possibly for a proper demonstration, but for a dev box then scattering a few drives over a few hosts is probably satisfactory.
Boffoonery - downloadable Comedy Benefit for Bletchley Park
I agree with the remark about most places do slow frames in a massively parallel system. We made the same choice. 40 dual 600mhz Linux boxes.
But... we also had lighting tests and renders for marketing that needed to done quicker than a nightly turn arround. That is when we used the O2000. We turned this machine into our file server, so now we can only dream of when we could render single frames faster.
In a perfect world, both systems would exist. A bunch of Linux boxes for 24x7 renders, a massively parallel box for large or quick turn arround single frames that could be used for "normal" renders 24x7.
-I just work here... how am I supposed to know?
nf
You can see why they ditched Crey Supercomputers. They noticed that busniness want cheap processing power, and they don't care how to get it. If you want economical, "Beowulf" clusters are the way to go now a days.
I am supriesed it has taken them this long to get some deals like this out the door.
bash-2.04$
bash-2.04$yes "Don't you hate dialup connections?"| write USERNAME
SGI really does seem to be going after linux. I recently took an rhce test in Dallas and out of 13 people in the class, 11 were from sgi. Kind of a trip.
I'm not trying to undermine the efforts of these guys, because I'm sure what they're doing is valid and is actually quite interesting in itself, but I'm having problems trying to see the commercial benefit of doing this. Above we are told that the purpose of this project is to "boot Linux on bigger systems", but that doesn't really same like a viable piece of research in many ways.
:-)
Commercially, if I want lots of nodes (16 nodes here), with Linux, I'm more likely to think Beowulf. If I want them to all appear as one machine, to be honest if I'm spending this sort of money I can see the benefits on going with Sun and Solaris. If I want lots of virtual linux machines running on one large easily-managable system then we already have Linux on S/390..
Can anybody tell me what the real commercial incentive is to run Linux on bigger systems? I'm just curious that's all. Perhaps I'm missing something here (almost certainly I'm sure).
This is a great machine for rendering or any other application that is both CPU and memory bound.
Some jobs do not parrallel well, such as individual frame rendering. With 24 boxes, the 5 + minute overhead of loading the scene file plus the memory spent on loading the textures and the geometry would be done on each machine, costing you 24x's the overhead of doing it on one machine. Trying to do this with a "quasi" shared memory system would kill the network. But would remove that hidious overhead.
Doing this on a NUMA box fixes all of those problems. The memory is shared. The procs all look like one machine. The system runs smooth and well.
This is why SGI is still in the large graphics server environment. People want individual frames done fast.
The benifit of this being a linux box and not Irix....
I, a huge linux vs. irix advocate, strugle to see why this would be good. Most of the apps that I would use are built for Irix first and then Linux (like Maya's renderer). I can see where others might have custom apps to use this, but the code would probably port to Irix just as easily as it would to Linux on the MIPS.
It is a step in the right direction, IA64 NUMA boxes running linux. The ultimate in render farm machines.
-I just work here... how am I supposed to know?
> Discovered 32 cpus on 16 nodes
;-)
Why does my kernel not discover something like that?
Linux runs quite well on Indy. My Distro is available, and there is Debian port in the works. See also The SGI site and The Unofficial site. There's even X available now for some configurations. This port isn't production-ready but it's certainly ok for casual use.
Wanting only one OS would be like wanting only one tool in the tool box. Use the right tool for the job.
From a less abstract perspective, I would rather want just NT than just one of any Unix OS. NT has more apps and is a generally a better fit for that lowest common denominator spot. Of course, I don't like using sporks, I prefer a spoon and a fork.
Then again, to finish out my wishy washy opinion this morning, it might be best for SGI to get out of the OS business. If SGI, IBM, and Compaq get out of the OS business (transitioning to Linux or some other common code base), then they might be able to leverage each other and focus on less redundant tasks.
need coffee....
Joe
Joe Batt Solid Design
The benifit of this being a linux box and not Irix.... I, a huge linux vs. irix advocate, strugle to see why this would be good. Most of the apps that I would use are built for Irix first and then Linux (like Maya's renderer). I can see where others might have custom apps to use this, but the code would probably port to Irix just as easily as it would to Linux on the MIPS.
:-)
If they can get Linux to run on most of their machines, then they will get access to all (or most of) the Linux stuff and they don't have to maintain the kernel for themselves (which I believe is not a cheap thing to do)
Futhermore wouldn't it be nice to have an "IT-infrastructure" with only one OS to support - no Irix, AIX, Windows, etc. to support - only Linux.
If this is completely wrong, then it is most likely because I don't know much about system administration or Linux/Unix generally
I am glad to see some work being done on Linux to add real support for truly massived parallel systems. It has always been said that Linux does not scale well past a few processors (perhaps 4 at most) because modifying Linux to support systems with larger processor counts would hurt performance on low end hardware. Additionally one can assume that the kernel developers in generally don't have access to such massively parallel architectures.
This little project holefully will prove that it can be done, and one might hope it's results will be applicable to less exotic multiprocessor hardware (say an 8 or 16 way x86 server).
-josh
I don't completely agree with that.
Beowulf is not good for rendering. Each job can have up to 500-700 megs of memory being used. Share this over a 100bT or Fibre or some other network protocol. It won't work.
We use other approaches for rendering. We spread the shot over a machine, not the frame. We eat the overhead of starting the renderer and reading the file. If possible, for those users who need one frame done fast, we threw it on our 4 proc O2000. That machine was taken from me, so now they just have to wait 4x's longer.
Beowulf has its uses. Production rendering is not really one of them.
-I just work here... how am I supposed to know?