Slashdot Mirror


Terascale Computing System Installed

lysie writes The Pittsburgh Supercomputing Center, with Compaq and the NSF, has installed the Terascale Computing System. Worldwide, it's second in power only to ASCI White at Livermore. However, it's the most powerful system in the world for unclassified research--6 teraflops per second. 3,000 Compaq Alpha EV68 microprocessors, in 750 four-processor AlphaServer systems running Tru64 UNIX."

38 of 108 comments (clear)

  1. Re:Where's Linux? by warlock · · Score: 5, Informative

    Bah... why not Tru64, why Linux?

    Obviously you've never used Digital Unix, and you are not familiar with their kick ass, highly optimizing compilers... they ain't gonna build a cluster like that to run apache+mod_php and serve crap you know, it's all about number crunching.

  2. I was gonna say it... by feronti · · Score: 2, Insightful

    But they're already working on building a Beowulf cluster of these. Though that sounds really wierd, now that I think about it... a Beowulf cluster of clusters... especially since each node in the cluster has 4 processors itself. Wow. Truly fractal computing.

  3. teraflops per second ? by Professeur+Shadoko · · Score: 4, Funny

    a flop is a floating point operation per second.
    a teraflop per second would be an acceleration in processing power... not what the article means I guess

  4. I thought the Alpha was all but dead... by BitwizeGHC · · Score: 3

    ... good to see somebody getting some serious use out of that trusty old CPU architecture, anyway :)

    --
    N4st0r, trixx0r h0bb1tz0rz! Th3y st0l3 0ur pr3c10uzz!
    1. Re:I thought the Alpha was all but dead... by Relic+of+the+Future · · Score: 2

      Actually, as I've heard (third hand: a friend of mine is taking a computational physics course, and the professor is involved somehow, and he told them...) the PSC wasn't the only group interested in getting their hands a few hundred of these chips. If they hadn't placed their order when the did, the NSA would have gotten them instead. As it stands the NSA apparently has a few hundred on back order now... or so he thinks. If he 'knew', he couldn't say... but he's pretty sure. :)

      --
      Those who fail to understand communication protocols, are doomed to repeat them over port 80.
  5. floating-point operations per second per second? by bdg · · Score: 2, Funny

    A rate of acceleration. The first computer to mature with age? Give it a year and it'll be doing 1.89e17 flops. Can you imagine a beowolf cluster of these? :P

  6. Scales like a real UNIX should by SumDeusExMachina · · Score: 4, Insightful
    3,000 Compaq Alpha EV68 microprocessors, in 750 four-processor AlphaServer systems running Tru64 UNIX.

    There will probably be a lot of people here asking "why isn't this running Linux?", without really knowing what they're talking about. First of all, Linux just doesn't have the kind of scalability that a commercial UNIX, particularly Tru64, does. Secondly, Tru64 is quite well-known for its excellent clustering capabilities, and its tight integration with the Alpha platform leads to high efficiency in computing. Finally, when you are paying $43 million for a supercomputer, you most certainly are going to be running the best software out there too, and frankly, the only reason that people out there are writing free software is that no one would want to pay for their code.

    When you pay for the cost of commercial UNIX systems, you are paying for the assurance that 1) you aren't going to have stupid design flaws like the one the 2.4 kernel has in its inability to use virtual memory efficiently and 2) All of your nice new custom hardware is going to be supported, and frankly, high performance drivers for high-end hardware under Linux are sorely lacking.

    --

    Is your company running tools written by ma
    1. Re:Scales like a real UNIX should by Marcus+Brody · · Score: 3, Insightful
      All though I agree with most of what you are saying, I think you should think about toning down the flamage:


      the only reason that people out there are writing free software is that no one would want to pay for their code.


      This is clearly not the only reason. There are a number of philisophical & practical reasons for free software. Furthermore, ther are numerous examples of people who are paid to write free software (e.g. linus, alan cox); and people who are paid to write propriety code (i.e. they are good enough programmers that someone is willing to pay them) in their job, but also are involved in free software projects in their own time.

    2. Re:Scales like a real UNIX should by Sircus · · Score: 2, Insightful

      "the only reason that people out there are writing free software is that no one would want to pay for their code."

      I'm no expert on Tru64 scalability, but this level of flamebait makes this post highly suspect to my mind. Would someone who both knows something on the subject and can manage to comment without bad-mouthing the competition care to say whether this post is really +3, Insightful?

      --
      PenguiNet: the (shareware) Windows SSH client
    3. Re:Scales like a real UNIX should by bconway · · Score: 4, Informative

      Alpha was the first non-x86 port of Linux, done by Linus himself after being given a DEC machine as a gift on a trip to the US (though I don't recall if it was on loan or not).

      --
      Interested in open source engine management for your Subaru?
    4. Re:Scales like a real UNIX should by oozer · · Score: 2, Informative

      If you are a big enough customer you can get the source to almost anything. Many VAX customers had the source of VMS back in the day (not that you would want to read it as it was mostly written in Pascal and Macro-11 but still, if having the source gives you that warm fuzzy feeling all's the good). The same was true for large Solaris installations even before Sun 'opened' their source.

      Bearing in mind that this machine was built by Compaq under contract I would find it inexplicable if the systems programmers on site did not have the source to tweak as required.

    5. Re:Scales like a real UNIX should by Paul+Komarek · · Score: 3, Informative

      Some of what you say makes sense, but mostly you sound like an advert. I'll pick on this line:
      "When you pay for the cost of commercial UNIX systems, you are paying for the assurance that 1) you aren't going to have stupid design flaws like the one the 2.4 kernel has in its inability to use virtual memory efficiently and 2) All of your nice new custom hardware is going to be supported"

      Having administered a Tru64 4.0 and 5.0 box, I can't agree with your statements about "what you pay for" when buying commercial UNIX systems. We had to upgrade because Tru64 4.0E did not support more than 8 SCSI devices on a single chain. Why on earth did we have to pay $1000 to be able to support an old SCSI standard?
      We would have moved to linux, except that we have a half-terabyte of ADVFS-formatted data -- i.e. our data is "held hostage" by a proprietary file system format. If all goes well, we'll soon have 700GB of linux-readable space with which we'll rescue our data and then reformat the original array.

      Oh, and let's not forget the time (before I was the admin, thank goodness) the machine was crashed by facilities to stop it from relaying spam -- turns out Tru64 ships (or shipped) with an open mail relay. linux has flaws, but at least you get the flaws for free! ;-) Oh, then there are the Tru64 network drivers for our old tulip based card . The card doesn't support full duplex, but Tru64 tried to make it support full duplex!

      Now there's the reboot cycle it got into, which corrupted the filesystem. However, the disk check ran without errors, and there's nothing unusual in the logs. Tru64 has some great features, none of which we need. We're only using it because we have to. We only paid to upgrade it because we had to.

      -Paul Komarek

    6. Re:Scales like a real UNIX should by Paul+Komarek · · Score: 2

      I didn't actually catch the model number of the machine, but I believe their using ES-40 (how do you pluralise a model number, without changing the model number?). We have an ES-40 in our lab, and I've spoke with the PSC guys about cheap (i.e. not from Compaq) memory for an ES-40. I got the impression that all of the nodes on their new big machine were ES-40s. But maybe it was a different "new big machine".

      Anyway, we run linux on our ES-40 Model II, and it works great. The machine is a dream to administer (I last thought about it several weeks ago =-). However, we're not doing any parallel processing on ours, and aren't likely to anytime soon. We bought it for expandability. The ES-40 Model II will hold 32 1GB PC100 ECC wierd-physical-form-factor dimms, and up to 4 cpu cards. We like to think of the cpu multiplicity as "more machines with no extra administration". The memory is great, too -- we've got folks running 15 GB processes. Maybe the program didn't need to use that much memory; but the astronomers don't have to get fancy with their source code, and hence are saving a lot of time and effort.

      You haven't lived (maybe died) until you've waited for a 4GB process to finish dumping core. Thank heavens I haven't been involved in any 15GB "accidents". =-)

      I will say that our Tru64 machine (a Microway dual EV67, similar to a Compaq DS-20) has held up surprisingly well under hellish loads: load averages above 5 for a week are not rare, and I was able to recover from a load avg of 40 during an administrative accident. It will be interesting to see if linux does as well for us. However, I've got a lot of gripes about Tru64, and I'd prefer running linux even if we have to ask the users to play nicer.

      -Paul Komarek

  7. Supercomputers... by maan · · Score: 4, Informative

    I'm in a class at CMU with the head of the PSC...we've been having fun these past weeks, with him talking to us about this "machine". Seems their #1 objective right now is to submit the best possible score for the TOP 500. Apparently, the deadline was October 1st, but then they have some time after that to "rectify" their score...

    There was a fun story apparently about a slowdown that was due to _one_ RAM dimm not seated properly... So 2999 processors were doing their job, but then waiting for the last processor to finish its job, which was taking much longer...

    I've seen pictures of this beast. All I can say is: wow. So many cables, so many machines...

    And apparently, they're not yet completely connected. Each box is supposed to have two connections to a "fat tree" quadrics network. Well right now they only have one... But it seems that Linpack isn't so communication oriented, so it's not too big a strain on the network.

    Maan

  8. Cheer up! by UserChrisCanter4 · · Score: 2

    Pittsburgh Super Computing previously sold one of their old cray units on e-bay.

    So, hey, maybe we'll be bidding on this bad boy in a couple of years...

  9. Terascale Computing System Installed by fea · · Score: 2, Insightful

    So why in the h$#% are we allowing Compaq to rid itself of Alpha processors? For those of us fortunate enough to have them (I use two at work), they are unbelievable number crunchers. Why? Why?

    1. Re:Terascale Computing System Installed by Paul+Komarek · · Score: 2

      At least Transmeta has shown some signs of marketing prowess. DEC and Compaq screwed up marketing the Alphas. At any rate, we've bought 7 alphas. =-)

      And I don't think Compaq sold the Alpha line because they couldn't make it work financially -- I think they sold it because they wanted to make the whole company look attractive to HP. Thank goodness we still have the PowerPC from IBM. That will hold for the next 10 years, at which time Intel might finally deliver a good IA64 implementation. We've been waiting, what, 4 years already? -- with no end in sight, either.

      -Paul Komarek

    2. Re:Terascale Computing System Installed by Paul+Komarek · · Score: 2

      Unfortunately, they didn't ask our permission. Including share holders like me. They ask me to vote for boardmembers, as if I ever cared. But when they decide to make a massive strategic change, they just do what they feel like. And selling the technology to Intel feels like twisting the knife. Mike Capellas, do you hear me? You JERK! I feel a little better, now, somehow.

      At least they've still got the iPAQ. Who knows what its future is, though. Maybe we'll get lucky, and Intel will bail on IA64 and just make Alphas. I can dream, can't I?

      -Paul Komarek

  10. Comment removed by account_deleted · · Score: 2

    Comment removed based on user account deletion

  11. Links to more info on TCS1 by martyb · · Score: 2, Informative

    Here are some links to more information:


  12. Comment removed by account_deleted · · Score: 4, Interesting

    Comment removed based on user account deletion

  13. Re:Where's Linux? by Lumpy · · Score: 2

    Why not Windows XP?

    OW! Stop throwing things at me!

    --
    Do not look at laser with remaining good eye.
  14. Some Science Editor... by BadDoggie · · Score: 2
    "Designers were battling against the speed of light, which limits how fast signals can travel the cables."

    Umm... no. Designers were battling against the speed of electrons though a cable, signal attenuation and noise. The speed of light is that limit on the gigaflops/sec acceleration or something.

    woof.

    I'm off to a bar only 18 kg from my apartment for a 9V glass of beer.

    1. Re:Some Science Editor... by Paul+Komarek · · Score: 2

      I suspect the speed of light problem had more to do with synchronization than with time-in-flight. That is, the difference of time-in-flight between computers is what mattered for maximum performance. For instance, it will take 10 times as long for a signal to travel 10 meters than it will for a signal to travel 1 meter.

      I think it takes about 40 nanoseconds for light to travel 1 meter. A 750MHz cpu does one cycle every 4/3 of a nanosecond. In 40 nanoseconds, a 750MHz cpu has gone through 30 cycles. Because these EV68 chips are really beautiful superscalar processors, a lot of instructions are consumed and retired during 30 cycles.

      If the time-in-flight between nodes is highly asymmetric, I expect it would be difficult to reasonably schedule work across all the nodes. With some machines 1 meter away, and others 33 meters, the nearest machines could get a signal 1000 cycles before furthest machines did.

      Even without all the computation, you can see that the nearest machines could get 33 times more work done than the furthest, when waiting for synchronization signals. What the computations were meant to show was that a 750MHz cpu, and in particular an EV68 Alpha, can do meaningful work during this time.

      Please take note that this kind of analysis is not in my field, and furthermore this post is the first time I've thought about it (though I am familiar with these chips, as I administer and use them daily). So please go easy on the flaming corrections!

      -Paul Komarek

    2. Re:Some Science Editor... by Paul+Komarek · · Score: 2

      It just occured to me that a P4, with its super-long pipeline, would be even more sensitive to timing and synchronization problems in this sort of setting. How many stages does the P4 have, over 20? That's 20 cycles after receiving the "go!" signal before you get an answer out the business end of the P4.

      I'd think this fixed-overhead incurred for each synchronization would have quite an effect for computational loads that required a lot of synchronization. Thus a cpu with a shorter (dare I say "reasonable"? ;-) pipeline would have an advantage on such loads. For instance, I'd like to nominate the EV68 Alpha. ;-)

      It would be great if someone who knew what they were talking about corrected me, since I really don't know what I'm talking about. It would be really great if this person wasn't as lazy as me, and could compare actual pipeline lengths instead of dredging up old and unreliable memories.
      -Paul Komarek

  15. Practical Uses.. by Lumpy · · Score: 2

    The most practical use I see for it is the pleasure of pissing off Compaq by using a processor they decided to abandon... and 3000 of them no less.. That is priceless in it's own right, I would love to see their reasons for using the Alpha over the other platforms.

    I would love to see some of the particle simulations this puppy could crunch, we might even see some accurate weather simulations finally..

    I just wish there was a way to get that kind of power available to the universities.. Could you imagine what grad students could do if they didnt have to write a 20 page thesis and description of what they would want to run on the system? some of the best discoveries are the middle-of-the night AHA! sessions that need to be ran at that moment.

    --
    Do not look at laser with remaining good eye.
    1. Re:Practical Uses.. by Anonymous Coward · · Score: 2, Informative

      PSC is run by Carnegie Mellon University, Pitt, and Penn State, IIRC. I'm only a sophomore at CMU and i've had accounts on two of their machines, already; one of them being the Cray T3E. So, right, students do have access to these machines.

    2. Re:Practical Uses.. by PapaZit · · Score: 2

      Actually, the PSC started as a joint venture between The University of Pittsburgh, Carnegie-Mellon University, and Westinghouse. (Trivia: their BITNET nodes were all prefixed with CPWSC for Carnegie Pitt Westinghouse Supercomputing Center: CPWSCA, CPWSCB, etc.) Practically, this meant that Westinghouse supplied the machine room space, CMU provided the staff space and handled administrative stuff (like paychecks) and, well, Pitt students and professors could work at PSC and deal with Pitt's payroll system instead of CMU's.

      The PSCs networking group spun off into the National Center for Network Engineering (NCNE) when PSC proper lost it's NSF funding (due to sheer administrative stupidity, IMO). The PSC provides the internet connection to CMU, Pitt, and Penn State, but they have no other real affiliation with Penn State.

      They're pretty liberal about giving out research accounts, so long as you're somehow affiliated with one of their funding sources. Currently, this mostly means that you have to be an academic of some sort in Pennsylvania (including students) or you have to be doing networking research, particularly for Internet 2.

      --
      Forward, retransmit, or republish anything I say here. Just don't misquote me.
  16. Wow! by Junta · · Score: 2

    I know, in a few years 6 teraflops will be nothing, but just read the Slashdot post, we are talking about 6 tera floating operations per second per second, with that sort of acceleration, imagine the terapflops it can acheive in moments. And here I thought processing power would always have to move at a contstant speed, no wonder this thing is a big deal. It may be only second best to start with, but in just a few seconds, it will beat the pants off the most powerful system :)

    --
    XML is like violence. If it doesn't solve the problem, use more.
  17. Cool!. by tcc · · Score: 2

    "COOL!!!" that's the reaction when you hear a cluster solution from alpha...

    "HOT!!!" is the reaction you'll get in 2 years from now when you'll see the same thing comming from intel.

    :)

    --
    --- Metamoderating abusive downgraders since my 300th post.
  18. Re:I cannot see the point by T-Punkt · · Score: 2, Informative

    > TRU/64 is used because Compaq Alphas are 64 bit processors, and
    > Compaq wanted to use their own Linux distribution.

    Err, Tru64 as it's called now, is no Linux distribution. It's Digital/Compaq's version of Unix for the Alpha architecture (or "AXP" if you prefer that). It had the names "Digital Unix" and "DEC OSF/1" before it was renamed to Tru64.

    And please: Don't feed the trolls...

  19. Re:Where's Linux? by Paul+Komarek · · Score: 2

    As far as the compilers go, the DEC compilers and fancy math libs are available for linux (thanks for everything, Maddog!). Many of the optimizing features of the Tru64 version of these compilers are available on the linux version.

    It seems like there is still a small performance delta in my experimental results (on our own Alphas, not on their cluster!), in favor of Tru64. But I can't be sure about this, and the delta wasn't large.

    Thus my conclusion is that 1) they liked something about Tru64 besides the compilers, and/or 2) Compaq liked having their name on the OS running the cluster, and gave them a good deal.

    Somebody else talked about support from Compaq. I expect the good folks at PSC know nearly as much about these machines as the Compaq engineers do. Furthermore, linux is a supported OS for these machines taken singly, though I didn't see "750-member ES40 cluster" in the support options... ;-). I expect that if you buy 750 Alphaservers from Compaq, you won't have to worry about support. =-)
    -Paul Komarek

  20. Late reply... by cr0sh · · Score: 2

    Disclaimer: I am not a kernel programmer, nor a compiler programmer.

    Maybe your friend (and his friends) need to get off their butts, quit passing the bottle, and help make the Linux kernel what it really could be. Same thing goes for people who "laugh" at the inefficiencies of GCC.

    Why is it that they have to sit around, drink beer, and laugh at code on a big screen? That sounds as pathetic as a bunch of beer-gut guys watching football, instead of out there playing it.

    Contribute! That is what is needed.

    However, I bet I know the reason why your friend can do nothing but laugh - he probably sold his soul and signed an NDA. Sucks to be him.

    --
    Reason is the Path to God - Anon
    1. Re:Late reply... by cr0sh · · Score: 2

      Yes, it is shocking that there are people who are all for themselves or for money, and none for others. That is shocking, and sad.

      I understand your argument about kernels for large systems vs. small systems, and how the needs of one may cause issues with the other, if implemented (or not implemented, depending on the direction). Perhaps in this case, the kernel needs to be forked into a parallel dev effort, one side for PCs, etc - the other dedicated to larger systems.

      I don't believe I have blind devotion - I use what I feel is best for my abilities - right now this is Linux. I have given thought to BSD as well, and also wondered and prodded about on various niche OS projects - but Linux seems to be the most viable, in that I don't have to worry about my hardware becoming quickly obsolete because of an OS change, and I don't have to worry about not being able to find and run a piece of favorite software because the OS no longer supports it. BSD allows this too, and I like its slower "rev" cycles - whereas Linux seems to be a frothy mess, everything being updated all the time - but it isn't something I worry about much.

      I just feel it is better to give back - because I know in the end others will give back to me. It has worked for me for a while now, in many areas of my life (not just the open source community). At the end of the day, I know what I have done to help has made a difference for others. Sometimes, I am even told that it has. This to me is better than any amount of money someone could give me for my work.

      --
      Reason is the Path to God - Anon
    2. Re:Late reply... by cr0sh · · Score: 2

      You are right - the debate will never be settled.

      What do I do for a living? I am a software developer for a Phoenix company. Our stuff isn't open source, probably never will be (but who knows?). I don't work 90 hours a week - my time is my time. I will put in extra hours when it is the right thing to do, but I won't do it just for the hell of it (ok, sometimes I do put in extra hours - you get into that "mode", where time just flows, and code is flowing - great state to be in).

      I don't think being paid to do a job is selfish - but if that is all you do with your life - working, getting money, being paid, never giving back - yeah, that is selfish. One could state that he did give back - to his employer - by working a ton of hours, but he didn't give back to the community. That is selfish. He didn't just learn programming on his own. He had teachers. You know it, he knows it, and I know it. I give back because of all the people I learned from typing in code from magazines and books from when I was a kid. I give back because of the numerous examples I have found online about ways of doing things. I give back so someone else may learn from me, and teach others along the way.

      But I do this on my own time. Not my employer's, my own. I do this because I love computers and software and coding - not because of these things can potentially make me money - but because of the worlds they have opened up for me. The insights, the freedom, the knowledge, the friendships - all of it!!! These things are things I cherish - and I wish to give others the chance to share in the same ways and feelings I have tread and experienced.

      It seems like today companies and people only want to make money, let no knowledge out, never truely give back. But should they succeed, they will simply be causing their own ultimate demise, for where will the new information creators come from? The schools? Perhaps - but what about those who don't go, or can't afford, higher education? Should they be denied these things? Should they not be able to program computers, render 3D graphics, or build their own OS, should they so desire?

      The corps are saying "Yes! THEY MUST BE DENIED!" in their mad rush to censor and restrict the flow of all information - not just copyrighted information (which I don't have a problem with, were it not that copyright extends forever anymore - ie, Sonny Bono/Disney Copyright Act, DMCA, SSMCA, etc). They want to even stop libraries, the internet itself. They are killing themselves and don't even know it, nor care.

      Don't cut out Open Source yet - it isn't over. The bubble burst because of bad investment decisions, investment by VCs who would take a business plan written on a barroom napkin. They were stupid, and arrogant. Many great projects have benefitted from the open source and GPL philosophies. Maybe they haven't made money yet because they haven't found the right business model. Maybe they won't find the right business model. But they should at least try, especially now in these more "sane" (and I could argue less sane as well) times.

      As far as my beliefs in OSS - could I write an operating system as you say? No, I could not - not because I couldn't learn how to do it - there are plenty of ways to learn how, and I am sure I could learn it if I was so inclined. I don't believe I could do it, for the same reasons that Linus didn't create the entire Linux kernel himself today. He created and released, from the first release, a very basic kernel. Improvements by others rolled in, and he incorporated them steadily, along with his own improvements. The thing kept getting bigger, until it is at where we are today. Kernel creation and design by near anarchy, is what it is. So no, I couldn't do it myself. I would be willing to bet, though, with a proper plan and good software design, and the release of a simple kernal that followed that design, you might be able to amass enough people to continue with development. The problem would be getting enough people who have access to the same kind of large scale arhitecture, which may be where and why this kind of development would fall apart. Of course, if you could get a company to "donate" a large dev machine to work on, it could be done. I believe IBM (or someone) actually has done this. Whether they did it for altruistic reasons or not is another issue...

      --
      Reason is the Path to God - Anon
  21. Comment removed by account_deleted · · Score: 2

    Comment removed based on user account deletion

  22. Comment removed by account_deleted · · Score: 2

    Comment removed based on user account deletion

  23. Comment removed by account_deleted · · Score: 2

    Comment removed based on user account deletion