fgodfrey · Slashdot Mirror

Re:SV1 and its friend, Origin on Cray SV1 Named Best Supercomputer for 2001 · 2001-08-12 16:12 · Score: 2

Actually, you'd be amazed at how similar the kernels are between an Indy and an Origin 3000. Obviously, there are a lot of platform specific changes in things like error handling, interrupt handling/routing, and some tweaks to the memory allocators to deal with the NUMA architecture. But, things like the scheduler and the filesystems and sort of general architecture wise, they are (pretty much) the same. Well, I suppose the Origin version is 64 bit and the Indy is 32 bit.

As for system size, the 512p limit is real. With only one exception so far (NASA Ames), the largest O3000 you can get is 512p. There's a special mode that you can run in where you sacrifice half the memory capability per node to get twice as many nodes and hence a 1024p system, which is what NASA has. There is a press release on that someplace at NASA Ames and SGI but I forget where. The "special" 2048 is actually a pseudo shared memory cluster, probably using an interconnect similar to (but a lot faster than) Myrinet or using something like HIPPI. This is actually what Blue Mountain is.

As for the Linux boxes, I worked with some prototype hardware based on the Origin 3000 series "chipset" with Itaniums. It was pretty cool stuff (I was working on porting the system partitioning software from Irix to Linux). We have also run an Origin 2000 version of Linux/MIPS on a 128p system.

Re:Wrong logo, Wrong idea on SGI Installs First Itanium Cluster At OSC · 2001-08-10 07:24 · Score: 2

Err, actually, you won't need a P brick unless you need more than 5 PCI cards. One of the features we are kicking around (read "maybe someday we'll do this but don't go telling anyone that we're committed to it") for system partitioning is the ability to boot a partition I/O less from another partition. You would then be able to NFS mount a root file system over the cross partition communication layer (network interface "cl0").

Re:Wondering why that buggy is so ugly? on Hyperion Robot Follows the Sun · 2001-06-26 03:53 · Score: 2

Um, no. It's *not* as stupid as it sounds. It takes a fair amount of (mechanical) engineering talent to win buggy since the buggies free roll (ie, aren't pushed) over a large amount of the course. It's an engineering school and you think it's dumb that it has an engineering competition? Are you just bitter you didn't get into CMU or something? Also, most of the buggy drivers I knew weren't asian and weren't bulemic either. You can actually fit a reasonably sized (albeit rather short) person into most of the buggies I've seen.

Re:Secret writings ? on So Long, Hitchhiker: Douglas Adams Dead At 49 · 2001-05-14 05:25 · Score: 2

At a talk at Carnegie Mellon in '99, he said part of the reason for the ending was that he hated having to spend 100 pages collecting all the characters into the same place so he decided that if he made them all dead, they'd at least be in the same place.

Re:Ha on To the Moon, Alice · 2001-05-08 02:31 · Score: 2

No, the Write Bros. did *not* just get in their plane and try to fly it. They had numerous manned and unmanned test flights of gliders before they tried the powered plane. They performed tests on the engines to make sure the motor wouldn't explode.

I think there is an award for the first person to manage a flight like this and if he succeeds, he'll get it. I, as an engineer, just can't envision trusting my life to something that hasn't been tested because about 90% of the time, that is a disaster.

Re:Ha on To the Moon, Alice · 2001-05-06 23:02 · Score: 2

Yeah, well, someone does *not* need to go up in this thing on the VERY FIRST FLIGHT! As far as I can tell, there are no test flights of the full system unmanned in the planning.

Flying on the thing after a successful test flight is risky. Flying on it before one is just plain stupid.

If he blows himself up, he deserves a Darwin Award not for trying it but for not testing it first.

Re:Can it really not be confused? on SGI Versus "Open*" and All Things "GL"? · 2001-04-02 23:08 · Score: 2

Well look, I didn't say *I* was confused by the two. I agree that any coder worth anything wouldn't be confused. But lawyers aren't coders, and they are most certainly the ones calling the shots here. The real question to ask is this: If you hire a lawyer, could the lawyer prove that the two are confusingly similar? Probably the answer to that is yes. In that case, we don't really have a *choice* on going after people because if someone ever decided to actually infringe on our trademark (say XYZ Graphics, Inc. ships a board with an incompatible API and calls it OpenGL), we couldn't go after *them* either.

So basically, what this all boils down to, is "trademark law sucks".

Re:Can it really not be confused? on SGI Versus "Open*" and All Things "GL"? · 2001-04-02 02:52 · Score: 2

I also just reread the intro to this article and it seems like /. is adding some needless hyperbole. If SGI were on a quest to rid the world of "Open" and "GL", the "OpenSSH" and "OpenBSD" crowd probably would have heard. It's more like our lawyers don't especially like "Open?L", which differs by only one letter from a trademark of ours. That is hardly "going after any name that starts with Open".

Re:Can it really not be confused? on SGI Versus "Open*" and All Things "GL"? · 2001-04-02 02:37 · Score: 1

Wow - in the time it took me to post this, about 15 people made the same point :)

Can it really not be confused? on SGI Versus "Open*" and All Things "GL"? · 2001-04-02 02:31 · Score: 3

While I'm not overjoyed to see that the company I work for is going and doing one of these stupid trademark enforcement deals, this particular one seems a little more on target than some. Afterall, "OpenGL" and "OpenIL" are fairly similar names and they *are* both image manipulation libraries. A little bit of a stretch, but not as bad as, say, etoy. The other two (OpenCL and AL) look a bit more dubious.

Re:What's it good for if your friends don't have o on Update From Cray World · 2001-03-30 03:40 · Score: 2

Well that would be because you keep switching points. You started out saying that it is easier to admin 1000 PC's than a Cray T3E, a point which I strongly object to, and finished by saying that not all tasks require a traditional supercomputer, which I agree with and stated quite clearly at the end of the post to which you originally objected.

Also, you keep trotting out examples of things that are in the category of "embarassingly parallel". Rendering video and chip design are two examples of that. Weather forcasting, oil exploration, particle physics (read nuclear bomb simulation), and protein folding (among many other things) are *not*. They require communication. This is why Pixar doesn't have a supercomputer and why Los Alamos National Labs does. If you don't understand why latency affects *dramatically* the speed of an MPI program, go read up on the subject and get back to me.

Finally, you don't seem to have much knowledge of networking. You don't just "add 100x the processors" and expect that the network is going to scale. That takes careful planning and hardware that you don't buy at CompUSA. I think you will find that building a really good cluster, while perhaps cheaper than a supercomputer, is a lot closer to that pricerange than what you would expect.

As for why Intel got out of the "supercomputer" business, I suspect they got out because they had no product for which there was a compelling reason to buy one. The Paragon was, for all practical purposes, a cluster (and from what I hear, not all that fantastic of one, but I may have a biased view on that). Plenty of people sell clusters.

Re:What's it good for if your friends don't have o on Update From Cray World · 2001-03-30 00:52 · Score: 2

I still don't see your point. So you had the number one spot on the top 500. That doesn't mean that anyone even *bought* your system (yes, I know Sandia bought Red). All that means is that you built something and ran LINPAC on it. I hardly see how that qualifies your statement that people would rather manage an independant cluster of 9000 systems rather than a T3E.

Re:What's it good for if your friends don't have o on Update From Cray World · 2001-03-29 23:52 · Score: 2

No - Seymore would have been building FPGA boxes. For Seymore's last design, see http://www.srccomp.com.

As for the physics stuff being obsolete, I'd say that it's not. Now this isn't because I don't think there's something better we could be doing, but because there isn't yet....

Re:What's it good for if your friends don't have o on Update From Cray World · 2001-03-29 23:47 · Score: 2

No, *nobody* would rather admin 1024 PC's. They may be *forced* to for cost and or availability reasons, but that doesn't mean they *want* to. I'm not so stupid that I can't think of how to assign hostnames and IP's and as long as everything works, those schemes are fine. It's when stuff breaks that you get in trouble.

ASCI Red was on top because Intel threw so many processors at it. LINPAC is not really all that representative of customer code. If you can tell me (and have data to back it up) that Sandia's code ran as well on ASCI Red as it would on ASCI Blue Mountain or a Cray T3E, I would be very surprised.

Re:Questions about this Beowulf thing... on Update From Cray World · 2001-03-29 13:59 · Score: 2

Uh, if you are using a T3E with more than one processor per partition, you are using some form of "single image mode" since the mk kernel runs on only one processor.

Also, I don't know how many large T3E customers you've talked to, but I've talked to several (7 or 8) and almost all of them cited the single image as one of the reasons they love the machine, specifically as opposed to the IBM SP2. Besides, even if what you say is true, I suspect that the chunks are larger than 4 PE's per partition, which is what you'll get (effectively) in a cluster.

Finally, while you may not miss being able to do a "ps" and see all the processes on the machine, you will probably miss the longer latency and lower bandwidth and lack of shared memory in a cluster, well designed or not....

Re:What's it good for if your friends don't have o on Update From Cray World · 2001-03-29 13:52 · Score: 2

I submit that if you look at the direction clusters are going in, they are heading for an MPP supercomputer. What do Myrinet and Quadrix have that gigabit ethernet don't? The ability to do a write from one side to another without going through the OS. I suspect that they are trying to figure out a way to do reads that way also. They are heading for NUMA.

Also, I could be reading you wrong, but you seem to be implying that supercomputer == vector supercomputer. I am including NUMA style machines like the Cray T3E and large SGI Origin machines as supercomputers. Since your example ap wasn't vector, I'll assume it was something along the lines of MPI, which would run quite well on those architectures.

Re:Questions about this Beowulf thing... on Update From Cray World · 2001-03-29 11:29 · Score: 2

Well, remember that the MTA is not being worked on by the remnants of Cray Research :) From what I hear, the SV2 OS is coming along quite nicely.

And yeah, I forgot to include the MPI/shmem library and compiler in my list of what they would probably port.

As for Unicos/mk, the real difference there was the ability to have no difference at the user level between it and other Unices. It is able to present 1800 different kernels to the user as if a single OS were running. No small feat in OS development :)

Re:What's it good for if your friends don't have o on Update From Cray World · 2001-03-29 11:22 · Score: 2

Yeah,special hardware assists for programming models seem to have disappeared. We are planning some for future SGI machines, though, and interconnects like Myrinet have sorta kinda assists for MPI.

As for the scheduling, at least on Irix, and I assume Unicos/mk, the scheduling/memory management is good enough that you can run multiple jobs. The way it works on Irix is that you can dedicate a certain set of CPU's and memory to a job and other sets to other jobs. That gives the job dedicated access to only the amount of hardware that it "needs" (or the programmer thinks it needs, anyway). I know similar stuff exists for the Cray T3E, but I'm not farmiliar with the details.

Re:Questions about this Beowulf thing... on Update From Cray World · 2001-03-29 07:05 · Score: 2

I'm not sure what exactly they meant by "the cluster will run the T3E operating system even though each node will run Linux". That is a contradiction. The T3E ran something called Unicos/mk which was a full blown Unix-type OS. In any case, what they probably really meant was that they will port the load balancing, gang scheduling (scheduling a job onto a group of processors) and some sysadmin tools. They will probably also port some of the process accounting tools. I suppose they could port Unicos/mk's kernel servers to Linux instead of the Chorus microkernel which is what it runs on T3E, but that would take a *lot* of work and may violate some of the agreements signed with SGI before the spin-off. I also doubt they have the resources to do this and continue to develop the OS's for the MTA and the SV2 (two different OS's) plus maintain Unicos (for the SV1 and previous vector systems) and Unicos/mk (for the T3E).

Re:battle begins on Update From Cray World · 2001-03-29 06:58 · Score: 2

This is not exactly a new thing for Cray. They have used Sun's for *years* as front ends. The SWS (system workstation) for stuff back at least as far as the Y-MP was a Sun 4/3xx rebadged as a Cray and running SunOS. The SWS for more recent Crays are up to SPARC 5's. The Ultra Enterprise 10000 was actually designed by Cray (well, by a company Cray bought called SuperServer, I believe) and sold to Sun either right before or right after the merger with SGI (boy were we ever stupid to give that to Sun!) So this is not like Sun just jumped into a market that they never had before.

Re:What's it good for if your friends don't have o on Update From Cray World · 2001-03-29 06:50 · Score: 4

The entire idea of supercomputing is obsolete?!?!?!?!?! Someone better tell that to the hundreds of places still buying supercomputers and SGI, IBM, NEC, Fujitsu, Cray, and several other companies who all make them. As people have pointed out, there are many reasons for buying a real supercomputer rather than a bunch of 286's, which I'm sure Google doesn't really use (Pentiums I would believe):

Management: Which would you rather manage - 1024 seperate PC's, each with their own boot disk, hostname, power supply, etc. or a Cray T3E with a single system image, and one boot disk. Think about the time it would take to do an OS upgrade on the cluster.

Bandwidth: Stuff like Myrinet and Quadrix (sp?) is quite good but it still doesn't come near the bandwidth that you can get on a traditional supercomputer. Google and SETI@home are *horrible* examples of real scientific code because they do almost no internode communication. We can get 1.6 gigabytes/second full duplex between nodes on our Origin 3000 product. T3E gets even more than that.

Latency: The time it takes to get from node A to node B matters *a lot* with real code. Again, SETI and Google don't care if it take 100 microseconds instead of 4 to exchange data. When you are exchanging lots of data and synchronizing with many other nodes, this matters. Many massively parallel jobs spend large percentages (like 25%) of their time doing communication. A lot of this is very small messages.

Quality: Usually, you get better components when you buy a supercomputer than a PC. Does this matter for you? Probably not. If you are trying to predict where a tornado is going to touch down, you're going to be a lot more interested in whether the machine is running.

Ease of coding: It is a lot easier to use a model of coding called OpenMP, which relies heavily on shared memory between threads, than MPI in which you have to explicitely call for communication between threads to happen. OpenMP runs best on large SSI supercomputers.

Now don't get me wrong - there are many applications for which a cluster is sufficient. This doesn't mean there is no room for supercomputers. Besides, if you look at the direction Quadrix, Myrinet, and the new Infiniband stuff is going, they are going to end up looking a lot like a shared memory supercomputer....

Re:Each Country's Own on India To Launch Its First GSLV Satellite · 2001-03-27 05:45 · Score: 3

The intro was rather confusingly worded. India has satelites. What the big deal here is is the *launch vehicle*. What that gets them is the ability to charge people for satelite launches.

Re:Mach on Bringing xMach To Life · 2001-03-21 03:56 · Score: 2

I think you are confusing "Mach" and "Microkernel". Mach is a specific implementation of a microkernel and unless I'm horribly mistaken, neither the NT "micro"kernel nor the BeOS microkernel specfifically use Mach.

Assuming you meant microkernel and not Mach, you missed one commercial (though admitedly not widely used) Unix OS that uses a microkernel: Unicos/mk, which runs on Cray T3E systems. Of all the OS's you list, I think Unicos/mk wins hands down - it scales to almost 2000 processors and you can reboot individual processors without taking down the entire system.

Re:1MBps? on The Dot in .mars · 2001-03-01 05:24 · Score: 2

Err, I believe that the sliding window on TCP is not tuneable per session. I think it's fixed in the OS. This may vary from OS to OS, but I'm not very farmiliar with network stack implementation.

I suspect NASA will end up inventing a new protocol for this. IP really wasn't designed for the kind of latencies and packet lossyness that you get on deep space links.

Re:1MBps? on The Dot in .mars · 2001-02-27 23:56 · Score: 3

If you fill a 747 with DAT tapes and fly it from LA to New York and then call someone in LA to say that it arrived, you're going to get a lot more than 1MB/sec. In fact, I'll wager that you'd get almost a terabyte a second (how many terabytes of DAT tapes fit on a cargo 747?? :) Granted, it'll take 6 or 7 hours, but you'll deliver a *LOT* of data.

Basically, latency and bandwidth have nothing to do with each other. The reason we perceive latency to affect bandwidth on the internet is because the internet requires acknoledgements for every n packets. That means that if you have a high latency, it'll take awhile for the ACK to come back and thus you slow down the transmission. If you design a protocol that takes into account that an ACK takes 8 minutes to arrive, you can get full bandwidth at high latency. You could even use TCP, if you expand the sliding window to allow it to send, say, 16 minutes worth of packets without requiring an ACK. It would suck for telnet, but streaming data (which is what NASA wants to do) would be fine.

Slashdot Mirror

User: fgodfrey

Comments · 356