Domain: teragrid.org
Stories and comments across the archive that link to teragrid.org.
Comments · 20
-
Re:College is more than listening to a lecture.The truth is that most modern Universities only emphasize teaching vs. research about 50% of time because at least half of all the activity in a modern University IS (and always has been) research. Sure, you do have to start out the first year or two in those big lecture halls learning the basics with all the other students, and perhaps that's where online components can help the most. But the best professors are the ones that recognize not only the top students in the class, but also the ones truly interested in going the extra mile, engage them in the classroom, and get them interested in helping with their research. The best students are the ones that realize this, and become more involved in the research activities in their department (they learn a whole lot more that way). These are the students that succeed. The rest of the students, and the bulk of the ones that go on to complain about unavailable teachers and professors that don't care about their students, are the ones that have somehow come to expect that going to college is the next thing to do after high school and necessary in their path to that six figure income with the corner office (that most of them are now realizing is a bunch of crap). Online courses are popular with the for-profit sector because those "colleges" are only interested in getting more students paying their overly-priced tuition into the pockets of their rich administrators, while churning out useless sheets of parchment to hang in the family rooms of unemployed former corporate drones that thought they were getting that degree to get the promotion that dried up when the company downsized,. .
.Interestingly, many Universities are utilizing the Internet heavily for research activities. Whether for reading the latest literature, creating online surveys on a variety of topics or communicating with patients, or even doing science "experiments" on supercomputers. True, you can't exactly inject mice or synthesize compounds on a computer, but you can run simulations of proteins and small molecules, and even run financial simulations and other calculations. And it's also easy to engage students to get involved in this sort of research, too, because all they really need is to use their computer to connect remotely to a campus computing cluster. There's not too much overhead in terms of laboratory space and chemicals to order and things of that nature -- the supercomputing clusters can be shared among multiple research groups on campus, or even across campuses, such as on the TeraGrid.
-
Re:Intel at it again...
The key word here is efficient. Specifically I am talking about operations per watt. If some combination of heat dissipation and cost to run the system are limiting factors, then this kind of efficiency is important.
But in the HPC world, the real limiting factor is the interconnect and the software interface. The interconnect latency determines how large of a job can finish in reasonable time, and is a fixed (high) cost per CPU. Meanwhile the software interface determines what off-the-shelf software will work with minimal investment. It's not worth spending programmer ($100k/year) or even graduate student ($40k/yr) time chasing a few watts when you have funding agencies expecting actual scientific results in the next quarterly review.
BTW, those custom supercomputers are neat and get to set records running highly specialized (and massively expensive in terms of programmer-hours), but look at https://www.teragrid.org/web/user-support/compute_resources and tell me how many TFLOPs are provided by x64 machines and how many are PowerPC and other weirdos.
TLDR: When you are pushing FLOPs, x64 (Intel and AMD alike) is king because it provides the most power per CPU. We care about watts, but don't want to screw either our scaling or networking-gear-budget.
-
Re:Thank God!
The computational resources are available. If the researcher needs clock time, he can talk to the folks at TeraGrid, among others. Of course, the researcher you mentioned was doing something similar to what OP wants, although more politely and probably the "correct" way, which is to try to get people who are working on problem X to work on cancer instead. At least the oncologist was "walking the walk" in that he is actually working on his topic of interest instead of just complaining that there is no cure for cancer.
-
Clu
If you want a monitor that can display useful information about thousands of nodes on a single display try clumon. We use it for our 1000+ node clusters. The software was developed in-house but is available under the University of Illinois/NCSA Open Source License Copyright (noticeware). If you're just going to use this in-house, the license shouldn't be an issue.
You can see a sample clumon display of a working cluster at NCSA Linux Cluster Monitor.The clumon page for that cluster shows you each the job status of each individual node (if the node is colored, it has a job assigned), the load on the machine (the height of the line is proportional to the load, and red tips show loads over 1.0 per cpu) and the service status (green underline is ready, yellow/black stripes is offline, and red is unexpected offline/no comms). If you mouse-over a node, a status box pops up with more information on that specific node.
As this was designed for a cluster with the Torque resource manager, it won't be exactly what you need, but since you are willing to write a monitor from scratch, it might be a really useful starting point. Design-wise, this monitor allows the engineer or manager to see what's going on in general, with problem areas being immediately obvious, and without being overly cluttered.
The open source Performance Co-Pilot software runs on each node to collect information, which is polled by the central server. Back end is MySQL. The dynamic display is PHP.
Straightforward, useful and very configurable.
-
Re:Naive question...
For a reasonable sample of the things that can be done on a supercomputer, start here: http://www.ncsa.uiuc.edu/Projects/. Those are just the things running at NCSA.
Followup with this, as the science gateways for the TeraGrid are designed to let scientists worry more about the science part and less about the programming part. Part of the reason to build bigger supercomputers is to let non-programmers get work done as well. By having more cycles available, the TeraGrid can allow access for codes that are easier for the average scientist to use, even if they don't make the best use of the machine. Not everyone is a wiz at parallel programming, and we shouldn't expect an expert in say, biology, to be just as expert in computer science.
-
Re:Duh?Most problems do not parallelize to large scales. Name a single real world problem that doesn't parallelize.
Obviously this depends on your definition of "real world." Many simulation problems in the physical sciences do not scale well, since each cell's step is dependent on all other cells. There are approximations that try to reduce this dependency, but approximations are never perfect. However, one may discount these as not "real world" since most people don't simulate low-level physics and such (and these aren't NP complete either, which are sometimes parallizable. E.g. when you double your problem, it may cause an 8x increase in work, but that may be parallizable. You can parallelize the traveling salesman problem which is NP complete according to wikipedia.)
There are much larger class that don't scale to very large scales. As I recall from my parallel programming class, after about 64 or 128 processors, shared memory breaks down due to limitations on bus interconnections needed for cache coherency. You can emulate shared memory with MPI and things, but it's WAY slower to the point of being useless for applications without a high degree of spatial locality. In fact, all but the embarrassingly parallel don't scale linearly due to shared memory and synchronization, so I've yet to see many non-trivial problems that scale to massive levels well. I'm talking 1000 processors or more (which is where we are headed, it seems, since they can't increase processor speed much. They have to do something to sell us new CPUs.) You may double the processors but only get a 20% speedup. One of many examples after 15 seconds of Googling, here Another one here where they doubled the processors and only got a semi-logarithmic increase in speedup (very common from what I recall from class.) Database updates won't scale well, since fundamentally you need some concurrency control to ensure ACID and that can't scale forever.
So almost anything can parallelize, but not everything can do so well. Sure it may be faster, but not nearly as fast as doubling the CPU speed. For many systems going to 2 or 4 processors will help a lot, since people also use multiple programs or services in the background, but that's low hanging fruit. (And welcome, I use dual core CPUs and find it helps for that reason. But 1000 cores? I don't think that will help do any common task for an average user.)
So, basically, I think we are all right. It's generally faster with more CPUs, just not much faster in the higher cases and we'll reach a point of diminishing returns. Is it worth it to double the cost of a CPU for a 5% speedup? For some, I'm sure, but eventually it just don't be worth it to increase the number of cores. I used to work with people who made parallel simulations and they'd spend years getting an application working on a specific architecture. They'd be ecstatic when they got a 10% speedup. Not really practical for most consumer products.
-
Re:Anyone out there care to comment?
http://tg-monitor.ncsa.teragrid.org/
Well, after clicking on the Reservations Link it appears to be that the system is, in fact, not busy at all:
Warning: mysql_num_rows(): supplied argument is not a valid MySQL result resource in /var/www/html/maui-current.php on line 49
There are currently no jobs scheduled.
Warning: main(maui-pend.php): failed to open stream: No such file or directory in /var/www/html/maui-res.php on line 93
Fatal error: main(): Failed opening required 'maui-pend.php' (include_path='.:/usr/share/pear') in /var/www/html/maui-res.php on line 93 -
Re:Anyone out there care to comment?
http://tg-monitor.ncsa.teragrid.org/
Well, after clicking on the Reservations Link it appears to be that the system is, in fact, not busy at all:
Warning: mysql_num_rows(): supplied argument is not a valid MySQL result resource in /var/www/html/maui-current.php on line 49
There are currently no jobs scheduled.
Warning: main(maui-pend.php): failed to open stream: No such file or directory in /var/www/html/maui-res.php on line 93
Fatal error: main(): Failed opening required 'maui-pend.php' (include_path='.:/usr/share/pear') in /var/www/html/maui-res.php on line 93 -
Re:Anyone out there care to comment?
>Actually you can check out the usage of this cluster online at http://tg-monitor.ncsa.teragrid.org/
That ancient console video game on the webpage looks pretty cool but what are the rules of it and which key to move the little spaceship around? -
Re:Anyone out there care to comment?
I've been using several supercomputers for my research project. Most of them are very busy. Eg. On the IBM P690(Cheetah) at Oakridge National labs,you have to wait for a week to get your 512 processor job scheduled. This is an extremely busy system. On the other hand,you have systems like the Itaniun cluster at NCSA(National Center for Supercomputing Applications) which schedules your jobs a lot quicker. Actually you can check out the usage of this cluster online at http://tg-monitor.ncsa.teragrid.org/ (don't slashdot it, it is quite useful to a lot of researchers
:-) ) -
Re:data reception?
-
Re:Did I read that right?
- PSC: 6 TFlops
- NCSA + Caltech + ANL + SDSC: 15 TFlops
I would encourage you to visit www.teragrid.org and read more than the front page to learn what the Teragrid is
-
Re:When will we do this ourselves?We (the USA) have multiple times:
The TeraGrid is the NSF flagship for grid computing - be it good or bad.
The Grid.org people are some of the former SETI@home people gone more general purpose.
And of course, there is The Global Grid Forum which is meeting in Chicago in a week or so. GGF is the standards behind the Globus enabled grid.
We could ask why CERN/etal couldn't have come up with a slightly more imaginary name?
We can also ask why NSF are such suckers for the last 20 years of hype from the people who have run the national supercomputer centers in the USA? Ditto congress. But that is a (sad) story for a different day.
And finally we can ask what Top500.org is going to do when people begin reporting HPL benchmarks using these things? That HPL became the standard that people are designing supercomputers around argues just how totally screwed up high performance computing really is at the moment.
-- Multics
-
The national centers USE Linux Clusters alreadyWe have 2 Linux clusters here at NCSA already, with a third in progress. See:
The Titan Cluster
The Platinum Cluster
TeraGrid Clusters Successfully Installed at NCSA
These clusters run either RedHat or SuSE Linux and are available for researchers nationwide.These clusters are not beowulf; they allow access through a general scheduler and have MPI to run programs that use a group of nodes at once. This gives the greatest flexability to the users to create a computational system that can be optimzed for the size and needs of their problem. The size of a cluster that can be supported at a national center allows enough computational power to solve problems that can't be solved elsewhere. Given that a cluster of a 128 nodes is now considered an instituitional asset and within the purchasing power of any university, it makes sense to use federal funds to create systems to handle problems beyond the scale of a cluster that any university might own.
Another aspect of this issue arises in the asumption that cluster computing is so easily accomplished that it might be compared to the setup of a single system. I respectfully submit that the simpliest of clusters is none too easy to deploy and use as of today, not to mention the lack of support one gets for the application of their scientific research to a stock parallel computing platform. The national centers can afford to have consultants and researchers on staff that specialize in these matters, as well as full-time admins.
Note: The opinions expressed here are my own and not necessarily representative of my employer or the federal government. In addition, given that I am employed by NCSA, a slight element of bias may be present in my statements.
:) -
TeraGrid
Here is a large Grid project that I'm working on.
-
TeraGrid Backplane
For comparison, the TeraGrid backplane, running between hubs in Los Angeles and Chicago, is supposed to have a capacity of 40 Gb/s. No speed records yet; they're just sending the first test packets.
That's about 3000 kilometers. Assuming lightspeed transmission, there could theoretically be something like 40 or 50 megabytes of data at a time in transit. -
TeraGrid Backplane
For comparison, the TeraGrid backplane, running between hubs in Los Angeles and Chicago, is supposed to have a capacity of 40 Gb/s. No speed records yet; they're just sending the first test packets.
That's about 3000 kilometers. Assuming lightspeed transmission, there could theoretically be something like 40 or 50 megabytes of data at a time in transit. -
What is Grid Computing?
I've seen a ton of questions asking what Grid computing is. The most common one being how does it differ from parallel/distributed computing?
First off, I highly suggest reading The anatomy of the Grid by Ian Foster et. al. It provides a pretty good overview into this whole Grid thing.
But for the lazy, here's a little bit. The Grid is more than parallel computing. Typically with parallel/distributed computing the problem or resources are static or both. Grid allows both of these to change. In a nutshell, Grid computing means not having to worry about where the compute resources are. Just start a calculation and it gets done. Just like how you don't worry where your power comes from, you just plug in.
The core of the Grid is virtual organizations. Under a VO, I could get together with a few friends and pool our resources. We could set up a registry and some factories (I'm speaking OGSA here, but whatever) and create some certificates. Then, we could submit jobs to the Grid and not have to worry about the resources that they're running on.
GSI provides some really nifty security features (based on X.509 I believe). Basically you provide a mapping that allows other authorized users to run commands on your computer. When you're on the Grid you create a proxy for your certificate that is passed to the process that you run on this other computer. Then if that computer needs more resources, it can create another proxy certificate and delegate to another server.
Also, Grid computing is more than just computing. There is data storage and instrumentation sharing also. You might want to check out PPDG, GriPhyN and TeraGrid for examples of these systems.
If you're interested in playing with the GRID, you can go download Globus Toolkit 3.0 Alpha or the Java CoG Kit which is a pure Java implementation of Globus 2.x (it's much easier to install than the regular Globus 2.2.x). -
The TeraGrid and the TeraScale machineThey're both NSF-funded systems. Any scientist in the country can get time on it, just by writing a proposal. A peer-review committee then decides who gets time and who doesn't, or if there's a better machine to use. They're both part of the NSF PACI (Patnerships for Advanced Computational Infrastructure). See www.paci.org for more info on getting time.
The money for the TeraScale machine was awarded last year, and it went to the Pittsburgh Supercomputer Center. The follow-on the the TeraScale machine was an award made two months ago, the Distributed TeraScale Facility, or the DTF. The DTF award went to NCSA in Illinois, SDSC in San Diego, Cal Tech, and Argonne National Lab. The winners decided to rename the DTF the TeraGrid. They've got a web page about the new system at www.teragrid.org
-
TeraGrid at SC2001TeraGrid will be present at SC2001 (a yearly conference and expo for supercomputing and high-performance networking). Just to give you a hint of what it is like, the showfloor will have more than 10Gb/s of total outgoing Internet capacity (plus more private/non-IP circuits).
If you're going to be in Denver the week of Nov 12, 2001, consider stopping by. If nothing else, the place will have free and open 802.11b!