Slashdot Mirror


Science Grid Genesis

Cranial Dome writes "According to this Cnet.com story, the Department of Energy (DOE) is working to interconnect the first two computers which will form the genesis of the DOE Science Grid, a virtual supercomputing system which will eventually encompass many more systems at several locations. The larger of the two machines: DOE National Energy Research Science Center's (NERSC) IBM SP RS/6000, a distributed memory machine with 2,944 compute processors. This machine, together with a smaller 160 processor Intel system, will make up a combined 3,328 processor Unix system with 1.3 petabytes(!) of storage space. And this is only the beginning..."

38 of 166 comments (clear)

  1. 1.3 petabytes? by alen · · Score: 3, Funny

    I guess it's going to be enough space for a full install of the latest Red Hat distro.

    1. Re:1.3 petabytes? by compwiz3688 · · Score: 2, Funny

      Alright!! I can finally download the whole Internet and browse offline! :)

      Ok, I don't know the size of the Internet. I'm just guessing...

    2. Re:1.3 petabytes? by sharkey · · Score: 2

      Or the DivX rip of Kevin Costner's next movie.

      --

      --
      "Outlook not so good." That magic 8-ball knows everything! I'll ask about Exchange Server next.
  2. 5 years from now by ch-chuck · · Score: 3, Funny

    AOL/TW starts mailing out free sign up DVD's to access their portal to the Science Grid. Within days messages start appearing in highly technical discussion forums that simply state "Me Too!".

    --
    try { do() || do_not(); } catch (JediException err) { yoda(err); }
    1. Re:5 years from now by geekoid · · Score: 2

      I don't know whether to laugh or cry.

      --
      The Kruger Dunning explains most post on /. http://en.wikipedia.org/wiki/Dunning%E2%80%93Kruger_effect
    2. Re:5 years from now by Cutriss · · Score: 3, Funny

      That's right - The Department of Energy in 2008 will post a new RFC, and within seconds, it's already got five comments, each saying "Frist Psot!"...

      --
      "Mod, mod, mod...and another troll bites the dust."
  3. Or... by wiredog · · Score: 2

    Win 2004. With all the options.

  4. (Slightly OT) 1.3 Petabytes? by Ecyrd · · Score: 3, Informative

    According to this paper, the entire human life takes roughly a petabyte of storage.

    Using the current prices, this amounts to roughly 150.000. It's not that impossible to store your entire life on a single computer anymore. These guys show that such a thing can be built.

    1. Re:(Slightly OT) 1.3 Petabytes? by Ecyrd · · Score: 2

      150.000, that is roughly, what, $120,000? Yeah, we use a comma to separate decimals, and a dot to separate thousands - sometimes you get confused when writing English :-).

      Note that the price actually gets spread out throughout all your life. If you started now, you'd need only about $5000 every year to buy the necessary hard drives. And considering the speed at which prices have been going down per Megabyte, it is likely that the original estimate of $120,000 is the upper bound, and the REAL price is a lot lower.

  5. Hmmm, This and the PS3 by gwizah · · Score: 4, Insightful

    Well it seems as though we may now know what Sony Engineers mean by "Distributed Computing"

    Seriously though, What type of security system is the DOE building into this, which is essentially a large mainframe? Its understandable to be worried when the DOE handles things such as nuclear secrets that sometimes slip into the hands of certain researchers, much like they were picking them up at a drive-through.

    Im curious to see how the data will be encrypted/decrypted along such a vast system.

    --

    There is no spork.
    1. Re:Hmmm, This and the PS3 by tcyun · · Score: 2

      Check out the work being done by the GGF Security Infrastructure team, the GGF Certificate Policy Group, and the Internet2/MACE Shibboleth projects for a start on security work and research in the GRID realms.

  6. Whoo! by Accipiter · · Score: 3, Interesting

    Remember back in 69 when a few government agencies and universities put together a small little network called "ARPANet?"

    It started off with something like four nodes. Look where it is today.

    --

    -- Give him Head? Be a Beacon?
    (If you can't figure out how to E-Mail me, Don't. :P)

    1. Re:Whoo! by Monkelectric · · Score: 2

      this is actually the new method of invention ... Every few years, the government invents a new and better kind of network, we take it over, they get pissed and decide to make an even better one where the whole process starts over again. progress!

      --

      Religion is a gateway psychosis. -- Dave Foley

    2. Re:Whoo! by Our+Man+In+Redmond · · Score: 2

      Ummmmm, yeah. Just look at where it is today. Maybe we'd better pull the plug on the Science Grid now while we still have a chance.

      :)

      --
      Someone you trust is one of us.
  7. SETI by Yoda2 · · Score: 2, Funny

    I wonder if they'll run the SETI client on it during non-peak times. We could find nothing that much faster!

  8. Guess it's finally time to answer the question by drew_kime · · Score: 2

    According to this paper, the entire human life takes roughly a petabyte of storage.

    Looks like interesting times for AI researchers. Does AI require as many transistors as the brain has neurons? Does it require the same amount of storage and information? Is there something else needed? Looks like we're soon to answer at least one of these.

    --
    Nope, no sig
  9. ** Warning, mistake in above ** by PsiPsiStar · · Score: 2

    I copied some of the text into the wrong section.
    It should read;
    This is approximately 1 trillion bytes or 1,048,576 gigabytes.

    under petabytes, not terabytes.

    Slashdot regrets the error.
    I could care less. :P

    --

    ___
    It's the end of my comment as I know it and I feel fine.
  10. The scheme of it all by fruey · · Score: 5, Informative
    Go to the link about the actual project. Look at the PDF. It explains things quite well, it's a wicked thang that is happening...

    Here, for the lazy, are some of the objectives:

    • Computational modeling,multi-disciplinary simulation,and scientific data analysis with a world-wide scope of participants and the use of computing and data resources at many sites.
    • High Energy Physics data analysis that involves hundreds of collaborators,and tens of institutions providing data and computing resources
    • Observational cosmology that involves data collection from a world-wide collection of instruments, analysis of that data to re-target the instruments,and subsequent comparison of the observational data with simulation results
    • Climate modeling that involves coupling simulations running on different supercomputers
    • Real-time data analysis and collaboration involving on-line instruments,especially those that are unique national resources
    • Generation, management, and use of very large,complex data archives that are shared across global science communities e..g.high energy physics data,earth environment data,human genome data
    • Collaborative,interactive analysis and visualization of massive datasets e.g.DOEs Combustion Corridor project
    • Multi-disciplinary R&D that integrates the computing and data aspects of the different scientific disciplines.

    Thus, the applications are enormous. Not that you couldn't do it distributed across desktops à la SETI, but here we're talking data integrity, and let's not forget that even SETI has a kick-ass centralised server setup or the whole thing wouldn't work anyway.

    But especially interesting is the document filename:-

    DOE_Science_Grid_Collaboratory_Pilot_Proposal_03_1 4.nobudget.pdf

    Now, who can get me the version WITH the budget? I want it. Hehe.

    --
    Conversion Rate Optimisation French / English consultant
    1. Re:The scheme of it all by RobertFisher · · Score: 2

      While I think this is an interesting experiment in pooling parallel resources, there are also enormous challenges involved.

      Anyone who has ever used a parallel machine quickly realizes that in most "interesting" problems, a great deal of inter-processor communication is involved. Even apparently "trivially" parallelizable tasks, such as a CG ray-tracing of a shot from a movie scene, often carry bottlenecks which limit their degree of parallelization. For instance, in the ray-tracing case, even though each ray can indeed be traced independently of the rest, each processor must store the 3D volumetric model it is rendering in memory. Eventually the size of the volumetric model exceeds the memory capacity of the processor, and rays must then be swapped among processors. The same limitations apply to any number of other tasks -- data mining (where one needs to search for correlations in a huge volume of data, too large to be stored on a single processor), simulation (where hyperbolic, or even more bandwidth consumptive, parabolic or elliptic PDEs are often solved), etc...

      Achieving good load balance in parallel applications is a key challenge in computational science today. It's quite fair to say that on the current generation of IBM SP2s, which are the most common architecture in high-end computing, the parallel performance for most applications is poor at best. Slapping on an additional machine, with an even tigher bottleneck over the network between them, is not going to magically solve any problems. It is going to push the state-of-the-art of a very LIMITED set of applications a bit further, but a lot more work at the hardware and algorithmic levels needs to be done before MOST applications can really benefit from the scale of these machines.

      Bob

      --
      Science, like Nature, must also be tamed, with a view turned towards its preservation.
  11. Connectivity by ksw2 · · Score: 2

    I couldn't find how they plan on interconnecting the nodes... I've always thought setups like this were rather hindered by their ability to pass messages quickly between nodes. If it's just standard slow WAN link like a T1, I suppose this would end up becoming more like a distributed.net model, and less an actual 'supercomputer' like the headlines imply. If I'm correct, there's a rather large difference in the applications.

  12. Re:petabytes by Compulawyer · · Score: 2, Informative
    Of course, the "standard" 2^n*10 system of measuring bytes means nothing if you are a disk manufacturer. There, you just redefine (in VERY small print, of course) a megabyte (or other flavorbyte) as one million bytes.

    This gives us:

    • Disk megabyte = 1,000,000 bytes
    • REAL Megabyte = 1,048,576 bytes
    Difference = 48,576 bytes, or about 15 floppies worth of space per Mb. With Gb sized disks, the difference is almost 49 floppies per Gb. Definition is everything.
    --

    Laws affecting technology will always be bad until enough techies become lawyers.

  13. Get Linda Hamilton on the phone... by El+Camino+SS · · Score: 2


    Quick, someone tell Linda Hamilton to head for the mountains! Her unborn child will be the only one to stop all of this madness!

    1. Re:Get Linda Hamilton on the phone... by bryan1945 · · Score: 2

      Funny, "Science Grid" doesn't sound _that_ much like Skynet....

      *grin*
      .
      .
      .
      *duck*

      --
      Vote monkeys into Congress. They are cheaper and more trustworthy.
  14. Re:Could you imagine....no, seriously by daniel_isaacs · · Score: 3, Funny

    There is also a point after which keeping an old SGI isn't worth the cost of space, power and upkeep.

    And that point comes precisely 4 days 7 hours and 29 minutes after unpacking and turning it on.

    --
    - Dan I.
  15. I Would Be Really Amused... by istartedi · · Score: 2

    ...if they used it to run a simulation of climate and discovered that the Science Grid was responsable for global warming.

    (insert your comments about how hot Company X's chips run below)

    --
    For all intensive purposes, "whom" is no longer a word. That begs the question, "who cares"?
    1. Re:I Would Be Really Amused... by bryan1945 · · Score: 2

      Company X's chip run so hot, we now call them XXX!

      *puts on fire protection*

      --
      Vote monkeys into Congress. They are cheaper and more trustworthy.
  16. Comment removed by account_deleted · · Score: 2

    Comment removed based on user account deletion

  17. A little more information by pridkett · · Score: 3, Informative

    This is a little surprising that it got posted and all because it's not all that earth shatterning news, but I'll provides some additional information about grids in General.

    There are a wide variety of systems like this that are either currently available or are being developed. Among them are Particle Physics Data Grid, NEESGrid and various European and Asian counterparts.

    The basic premise is to allow access to various resources you don't have at your desktop. This is not to be confused to with putting all these computers together an forking a process a billion times and having it run it run all over the globe. It's more like saying I have a process that requires 128 processors and 4GB of ram, go find it an run it for me.

    Most of the systems use Globus which is pretty much the defacto standard. There are other systems out there such as Legion and Condor which serve slightly different purposes.

    I've also seen some issues about security raised, so I'll mention them quickly. Globus is built upon an API called GSS (Generic Security System), I believe it will soon (if not already) have an RFC published. This is a layer on top of various other security systems that may be local to the server running it. It can use Kerberos or PKI to do encryption across the network (don't flame me if it's wrong, I'm not security expert).

    When I wish to start using the grid, I start up my proxy that takes care of all authentication for me. Then my proxy connects to the gatekeeper on the remote machine which authenticates me based on my private key and then authorizes me via a mapping (usually just a text file). The task is then executed by the gatekeeper via the mapping on the remote machine. Input and output can be redirected over a secure layer if you so desire.

    My certificate is issued by an authority. In this case the Globus CA. The nice thing if that if you want to set up a grid of your own computers, you can get a cert from them too. Install Globus and it will tell you how.

    Certificates also allow you to get access to data. This allows me as a user A to run program B at site C providing results to user D at site E for a period of time F.

    It's all terribly neat and remarkably easy to install on your favorite Linux or Solaris box. It's also fairly easy to write programs to utilize the Grid thanks to the various CogKits for Python, Java and Perl.

    --
    My Slashdot account is old enough to drink...
  18. Re:And that's just for Bill's salary accounting... by bryan1945 · · Score: 2

    Now that you mention it, I'm extrapolating how long until Norton Anti-Virus takes up 1.3 petabytes.....

    For Windows- 2008
    Everyone else- 12,234

    *schwing!*

    --
    Vote monkeys into Congress. They are cheaper and more trustworthy.
  19. Latency by 4of12 · · Score: 2

    These grids are all great and wonderful as far as peak performance is concerned, but I'm wondering how the latency associated with long haul networks affects peformance for a range of applications that are not embarrassingly parallel.

    --
    "Provided by the management for your protection."
  20. Re:And that's just for Bill's salary accounting... by Tuzanor · · Score: 2
    Microsoft has always pondered why nobody uses Windows on Ultra-high end hardware such as this. One reason is that organizations that get this kind of hardware want extreme customizability. Microsoft would have to allow these organizations reasonable access to their source code. Even if Microsoft were to do this, the terms for this would be hightly strict, so most people figure, mah, the hell with it.

    Another point is the fact that Windows has only been released on a handfull of architectures. To have systems such as this, you need support for ungodly amounts of memory. The best platform for windows at this point is X86, which is limited without more hacks that are worth the time and money.

    Even with windows NT on Alpha, windows didn't even come close to tapping the full potential of the architecture. At the time windows NT was the core product for MS servers, MS had a different agenda. Now that the Itaniums are coming, its a good bet that MS may want to try their hand at this market...but I don't think they'll get far.

  21. Re:petabytes by Compulawyer · · Score: 2
    I have to note that you cite to a change that was instituted. Interesting, but all the CS books on my shelf still refer to the 2^n*10 measurement.

    I also note that when your format a 100 Mb disk, your OS (MacOS and Windows - probably *nix systems too - I haven't tested it) reports the volume size as about 72 Mb. I propose that the cause of the "problem" is the fact that disk manufacturers redefined the term to make their disks appear to have greater capacity.

    Think about this: if you are a consumer, do you really care if a megabyte equals 2^20 bytes or one million bytes? I propose that you do not - you simply care that everyone who uses the term "megabyte" means the same thing so you can accurately compare apples to apples. AFAIK, one megabyte of RAM is still 2^20 bytes of RAM. Why shouldn't it be the same for non-volatile media?

    How many consumers have called disk manufacturers or other help lines asking "Where did my space go? The label says 100 Mb but my computer says there are only 72. I want my other 28." If you adopt the "solution" you propose, you have to get the RAM industry and the OS providers to adopt it as well to be consistent.

    --

    Laws affecting technology will always be bad until enough techies become lawyers.

  22. Supercomputers - oink by Animats · · Score: 2, Flamebait
    Supercomputers are mostly a Government pork program. Notice that there are very, very few of them in the private sector. It doesn't make sense to have a supercomputer unless you have single problems that require large amounts of time on it. Supercomputers aren't economic as crunch engines - they cost more per MIPS than good desktop machines. That's because they're low-volume, hand-built machines.

    This is the fallacy of "supercomputer centers" and "supercomputer networks". You don't want 1% of a supercomputer; you want a machine of your own.

    There was a time when sharing big number-crunching machines made sense. Until the mid-1980s, there were commercial scientific computing service bureaus running big iron and selling CPU time. They're all gone, along with Control Data Corporation, Cray, and the commercial market for supercomputers.

    If you really want a shared big engine cheap, cut a deal with a big hosting provider for off-hours time on the server farm. Set up a Beowulf cluster of a thousand rack-mounted 1U servers, crunching from midnight to 6AM every night. All you'd really need to do is negotiate a bulk buy of offpeak-only shell accounts. All the machines are identical and the cluster has lots of internal bandwidth, so you can get real coordinated work done, not just the low-bandwidth stuff like SETI and cryptanalysis.

    1. Re:Supercomputers - oink by ryanwright · · Score: 2

      They're all gone, along with Control Data Corporation, Cray, and the commercial market for supercomputers.

      Tell that to IBM...

      Set up a Beowulf cluster of a thousand rack-mounted 1U servers

      Clusters have their own set of issues and problems.

      This is the fallacy of "supercomputer centers" and "supercomputer networks". You don't want 1% of a supercomputer; you want a machine of your own.

      But everyone can't have a machine of their own that processes huge parallel jobs. You have to buy one, and share it between many users. So while you may only get 1% of a supercomputer's time, during that 1% of time you can use 10-100% of it's power. Considering the type of jobs we're talking about, that's a hell of a lot better than having a regular desktop crunching 100% of the time. It could take months to complete a job that could be done in an hour on a supercomputer, and waiting months for each step during your research would really suck.

      The fact remains, supercomputers are not dead. They're still widely in use and people are still buying them for good reason.

      --
      -Ryan, with the unoriginal sig
  23. Re:petabytes by Compulawyer · · Score: 2
    Floppies (3.5" versions - DS, HD) as formatted for IBM, can hold 1.44 Mb of data. That is 1.44 REAL megabytes - the 2^20 kind.

    So, here is the math I should have done:

    • 1 Kilobyte (Kb) = 2^10 bytes = 1,024 bytes
    • 48,576 bytes / 1,024 bytes/Kb = 47.44 Kb
    So the real difference is about 47K, well under the capacity of a single floppy. I screwed up the math. Too much time practicing law, not enough number crunching. My apologies.
    --

    Laws affecting technology will always be bad until enough techies become lawyers.

  24. GRID Computing by aallan · · Score: 2

    GRID Computing is the current sexy term in scientific computing, but its something that is so vague that it can mean all things to all people. Which is perhaps why its suddenly so popular, everyone can get their pet project funded.

    To some people it means actualy hardware, routers, fibre, supercomputers, that sort of thing. Certainly in the UK and Europe this group consists mostly of Particle Physicists, see the GridPP Project Homepage for details of whats going on there...mostly the Particl Physicsts seem to have ridiclous amounts of data on their hands (Petabytes/day) that they have to ship. Fun stuff!

    To the astronomical community it means software, virtual observatories, data mining and intelligent agents. In the UK and Europe have a look at the AstroGrid and the AVO projects. Although some of us are talking about hardware, the project I'm working on for instance, eSTAR, is putting robotically operated telescopes onto the GRID. However even here the main focus of the project is on the fun stuff we can do with the software, intelligent agents and data mining spring immediately to mind. In the US the NVO is the main focus of GRIDs for the astronomers there...

    Al.
    --
    The Daily ACK - Eclectic posts by yet another hacker
  25. Re:Brain Mapping? by joib · · Score: 2

    IANAB (biologist), but I think the problem is that nobody understands exactly how the brain works.. Yes we know that there's these neurons sending electrical signals to each other, but I don't think there is any theory on how this ultimately gives rise to the cognitive processes in the brain. Not that I'm saying that supercomputers would be useless in brain research, this article mentions some IBM guy planning to simulate how the "electric storms" during an epileptic seisure propagate or something like that.

  26. Re:petabytes by mysticbob · · Score: 2
    definition is everything. fortunately, we don't have to guess anymore, and explain 'real' to 'power-of-ten' bytes anymore. standards are here, have been for years, and are your friend:

    • one kibibit 1 Kibit = 2^10 bit = 1024 bit
    • one kilobit 1 kbit = 10^3 bit = 1000 bit
    • one mebibyte 1 MiB = 2^20 B = 1 048 576 B
    • one megabyte 1 MB = 10^6 B = 1 000 000 B
    • one gibibyte 1 GiB = 2^30 B = 1 073 741 824 B
    • one gigabyte 1 GB = 10^9 B = 1 000 000 000 B


    National Institute of Standards