Domain: nersc.gov
Stories and comments across the archive that link to nersc.gov.
Comments · 71
-
Re:What distribution?
... a version of Linux optimized for 65 thousand processors ... a 65 thousand thread program ...
Uh, NO!
The linux distro will be an extremely stripped-down version of the kernel that will only contain the bare necessities.
Blue Gene is NOT a shared-memory computer -- with a single kernel running all 64k processors -- but rather a cluster of 32k seperate computers (with two processors), each on one chip.
The best info can be found in this large NERSC report, and at the blue gene website.
By the way, Blue Gene itself will probably not be used for nuclear weapons research. That is what "Blue Gene L" is for.
-
Might want to check out what Cray and Sandia
Cray and Sandia say it is a 40 tera*OP* system, not a 100 teraflop one. See what Cray says here and what Sandia says here The really interesting thing is not the processor, but rather the interconnect which seems to be very similar to the torus used in the T3E.
In other supercomputing news, check out what NERSC is proposing for their Earth Simulator Response Proposal. It's a 160 teraflop machine...
-
Might want to check out what Cray and Sandia
Cray and Sandia say it is a 40 tera*OP* system, not a 100 teraflop one. See what Cray says here and what Sandia says here The really interesting thing is not the processor, but rather the interconnect which seems to be very similar to the torus used in the T3E.
In other supercomputing news, check out what NERSC is proposing for their Earth Simulator Response Proposal. It's a 160 teraflop machine...
-
Re:Something to look forward toAhem. From D. Bailey and R. Crandall, On the random character of fundamental constant expansions, Experimental Mathematics 10 (2001), p.276:
"Even the weaker assertion that every finite digit string appears in the expansion has not been established, to our knowledge"
-
Open Source is already in the Government
There are lots of examples of Open Source sw being used in the government. It's already used by NASA on the International Space Station and on various SpaceFlight experiments such as Flight Linux . The NERSC also works with Linux and provides M-VIA which is an implementation of Virtual Interface Architecture (VIA) for Linux. The above are but just a few places in government where Open Source sw is already being used.
The government, as explained in Micheal's text, needs to account for its spending and show transparency...it cannot favor *anybody* or any *product* without justification. Therefore, it is only logical that at this time we find Open Source being used in the Research and Development areas of the government where the flexibility and COST of Open Sw gives it an undeniable advantage. -
In kernel checkpoint support?
I'd really like to see one of the checkpoint patches includeded in the mainline kernel series. There are several to choose from: EPCKPT, CRAK, CP.... Which one doesn't matter (feature wise), they all basically allow for the kernel to stop a process, save it's state and pages to a file, and then load and restart that process by request.
Yes, I could distribute a patched kernel across all of my systems. But then I'm tied to that kernel until whichever project I'm following updates their patch (or I update it myself, and I don't consider myself competent as a kernel hacker). This would be a really useful mainline feature for those of us in the scientific computing community. Wasn't there some talk of one of these going in 2.6 proper? --M -
Re:Japanese scare
The politics that follow this 'sale' ought to be rather interesting. NCAR bought a Japanese supercomputer some time back and nearly got wiped out by funding deletion by the US Congress.
What happens next ought to be VERY interesting.
On the other hand, the Cray employees I've talked to - needling them for giving into the dark side and selling a SX-6 - have said that anything that is good for vector computing is good for Cray: they can always sell a follow-ob with their SV-2 and SV-2e.
I saw a post that I skimmed above that stated something to the effect that "you'll never touch [a supercomputer]. We, at NERSC, are still looking for a few good sysadmins. Keep in mind we're pretty brutal about who we let in, but if you think you have the right stuff to be a sysadmin on some of the world's most powerful machines...;) -
Bandwidth Challenge SC2001
Every year there is a competition at the high performance conference (Supercomputing 2001 was this last one). It is entitled the 'Bandwidth Challenge'. This last year, NERSC took first place with a 3.3 gigabit/second sustained graphically represented simulation using seaborg.
Now, admittedly, it wasn't intercontinental, only from Oakland, Ca to Denver, Co....:D -
Bandwidth Challenge SC2001
Every year there is a competition at the high performance conference (Supercomputing 2001 was this last one). It is entitled the 'Bandwidth Challenge'. This last year, NERSC took first place with a 3.3 gigabit/second sustained graphically represented simulation using seaborg.
Now, admittedly, it wasn't intercontinental, only from Oakland, Ca to Denver, Co....:D -
NERSC ( was Re:LBL Uses them)Actually, there's quite a lot of Linux at LBL. I worked there until June, so I have some idea what I'm talking about. There is PDSF, which is a giant node farm of a couple of hundred machines in a beowulf-like system.
PDSF has grown to a bit over 300 as of this end of fiscal year. It's one of the systems that NERSC runs (you know that Rob, but for the uninitiated...). I am not sure that I would call it a Beowulf though.
Most of our laptops are lil linux machines. For desktops we still use Solaris boxes. We also have FreeBSD on a few servers here and there.
As for other *nix machines, we have the Crays (68 processor SV-1 cluster and a ~700 processor T3E with Unicos and Unicos/mk respectively), PDSF (300 odd cpu mix of Intel and AMD machines w/ linux), Alvarez (a ~200(?) CPU beowulfish cluster), and Seaborg (3000+ processor IBM SP system running aix).
-
Re:Where's the code ?
Right here in C
-
No! No! No!
From the discoverer of the formula:
"This formula permits one to compute the n-th binary or hexadecimal digit of pi, without computing the first n-1 digits, by means of a simple scheme that requires very little memory and no multiple precision software. Here are various items that may be of interest: "
Yes, that's right, binary or hexadecimal. In other words, you can forget about doing "cool stuff" with it. Who cares what pi is in binary?
-- -
Re:nth digit of pi
I think the more generalized algorithm (which might be what they are talking about) is the PSLQ, which is only like a year and a half old. They talk a little about it at http://www.nersc.gov/news/bailey1-20-00.html, but the PSLQ link seems to have been removed due to a copyright lawsuit.
-
Re:Hmmm, YABL (Yet, Another, Broken, Link)
it was
/.ed - also try his home page for more links and such. -
Algorithm sources and other stuff
At the man's homepage.
http://www.nersc.gov/~dhbailey/
Check out the piqp.c in the middle of the page. -
Re:8.5W?These guys have a Pentium III-500MHz system which is supposed to run at 5W typically. We have one and have been running some of our code on it (it's actually owned by another group and we're borrowing time on it for a while.)
Their claims would be impressive if they made it work at 8.5W max, rather than 8.5W typical.
-
LLNL B-451 - the home of ASCI white
B-451 used to be the unclassified National Energy Research Computer Center. It has been the home to a lot of neat technology over the years. It started out with a CDC-7600 and a CDC-6400, then Cray 1 Serial # 6 was added. That machine had 500K 64bit words of memory and 16 CDC DD-14 disk drives ( 300 MB each ! ). Cray 1 Serial # 33 was added ( 1 Million words of memory ! ) and later an XMP which was a multiprocessor machine. The center was the home to Serial # 1 Cray 2, nicknamed bubbles. That machine was delivered on a Monday morning and was able to run the Livermore written operating system ( CTSS Cray TimeSharing System ) for a few minutes on Friday - till it crashed. The machine was in full production within a few weeks of delivery, quite an achievement for any first off supercomputer. The NERSC Engineering crew had built a hardware simulator of the cray 2 and given the OS / Compiler / Library guys months of time to work on debugging the system and supporting codes. The operating system was written in a variant of Fortran, and occupied about 2 - 3 percent of the CPU resources. NERSC was built as a Triad - A group of supercomputers, a large hierarchical file storage system tightly coupled to those sysems and a high speed ( for the day ) network connecting a nationwide user base to those resources. NERSC ( actually its predicessor, National Magnetic Fusion Energy Computer Center NMFECC ) was believed to be the first ever attempt to provide supercomputer access to a geographically distributed user comunitiy. Later machines added to NERSC included Cray YMP-C90 and the Cray T-3D, a particularly interesting "bridge" machine which provided users with both the then standard Cray vector processor architecture as well as a torus connected array of 128 Alpha processor nodes. This gave a user the ability to start with a basically unmodified code which had run on a Cray computer and then analyize the section of code which took the most time in the run. The programmer could then work at converting that section to run in the torodial array, hopefully enhancing performance. The machine enabled programmers to learn the new parallel architecture with baby steps. NERSC got moved to Lawrence Berkeley National Lab in 1995-1996, they are still on-line at http://www.nersc.gov .
Zoot
-
Re:CentraVision's license?
A kernel module falls under the GPL. Yes, I know, binary-only modules are allowed by convention, but it still sucks.
You're going to be out of luck should you find a later kernel gives better performance but breaks binary compatibility. Think about proper async I/O, which is coming and can give a handy boost. If you have the budget for Fibre Channel fabrics at some point, at least look at the Global File System.
BTW, if you're going to compare this cluster with a Cray T3E or IBM SP, actually compare them, don't just say they're comparible. The T3E's network is one-of-a-kind, with large bandwidth and almost no latency. (And I certainly wouldn't comare MPI implementations. Myricom's sucks and is causing no end of problems for some other projects.) You can't compare on that aspect with any commercially available interconnect. And there are much larger SPs around and coming, like San Diego's and the second phase of NERSC's.
Don't take this the wrong way. What you've put together is impressive, especially surviving the procurement process, but there's still a lot of work to be done to catch up with the big boys. You know that, but a good many people reading the iterview may walk away with a good-we're-at-the-leading-edge-now impression. We aren't. We're at the cost-effective edge, but we can make the leading edge...
-
The Beowulf issue is management.
With beowulfes getting ever bigger, the issue faced by administrators of these beasts are very much related to management.
By chance, some projects answer this issue, as BLD (free, Berkeley Lab) or ALINKA RAISIN (commercial) and ALINKA LCM (GPL), but there are still things to be done. Moreover, once you have overcome the software management, you still have to deal with the hardware (of these 1000 fans, one _has_ to fail...).
Still, the hardest job is not for the administrators: users have to actually write good parallel code... and this is no piece of cake. -
Re:They're using VI for *that*?In this context ``VI'' stands for Virtual Interface. It is a way to get low-latency communication between processes within a cluster. It can accompish this by having less protocol overhead than routed IP protocols, and by avoiding user-to-kernel context switches and user-to-kernel buffer copies. In the ideal case the data goes from a user buffer to the NIC by DMA with no kernel participation. Data is also DMA'ed directly from NIC to a user buffer. Of course this requires a little bit of help from special hardware or firmware on the NIC.
You can find general info at http://www.viarch.org.
Info on a Linux version which can work without special support from the NIC is available from http://www.nersc.gov/research/ftg/via.
-- -
Re:Maybe you have an answer to my questionActually, quite a few cluster environments run SMP for one very good reason: communication overhead. Communication is much faster in a shared memory environment.
NERSC, for example, has recently purchased an IBM SP system which has two processors (or was is 4?) per node, with plans to upgrade to 16 processors per node.
The problem with SMP and clusters is that the message passing software has to be smart emnough to take advantage of the shared memory situation, and needs to and this can also complicate things when you try to optimize your code.