Grid Computes 420 Years Worth of Data in 4 Months
Da Massive writes with a ComputerWorld article about a grid computing approach to the malaria disease. By running the problem across 5,000 computer for a total of four months, the WISDOM project analyzed some 80,000 drug compounds every hour. The search for new drug compounds is normally a time-intensive process, but the grid approach did the work of 420 years of computation in just 16 weeks. Individuals in over 25 countries participated. " All computers ran open source grid software, gLite, which allowed them to access central grid storage elements which were installed on Linux machines located in several countries worldwide. Besides being collected and saved in storage elements, data was also analyzed separately with meaningful results stored in a relational database. The database was installed on a separate Linux machine, to allow scientists to more easily analyze and select useful compounds." Are there any other 'big picture' problems out there you think would benefit from the grid approach?
This seems like a waste of computing resources to me. How many people do you know with malaria? Cancer is far more prevelant.
join teh gnaa if u r tootaly a winrar
The search for new drug compounds is normally a time-intensive process, but the grid approach did the work of 420 years of computation in just 16 weeks.
Cue the stoners in 5, 4, 3, 2....
Wizard Needs Food, Badly
It strikes me as strange that something like Wikipedia could not be distributed across user's PCs in more of a peer-to-peer fashion. Surely the web itself could benefit from further decentralisation. This issue bothered me some years ago, when I discovered that my desktop PC at work had about 40Gb of unpartitioned disk space. I often wondered about the sense of running file servers in big organisations, when each user probably has a few tens of gigabytes of unused or unpartitioned disk space. If illicit music and video can be distributed by P2P, why not all information?
If the grid solution finds THE cure for H5N1, will it be patentable? If not, who pays for the R&D to implement it? Who gets the patent? Do the thousands of people who allowed their PCs to be used get anything? Will big drug companies be able to use this and keep the prices low for the final product?
Support NYCountryLawyer RIAA vs People
sorry, i missed that definition. what is that in library of congresses per human hair?
(420 years / 16 weeks) / 5000 computers = 1:4 scalability!!!
Frickin amazing! No one's EVER done that before.
how abouutt a drog thet maks slshdaughters spel gooder and youze gooderest grammer?
"Tu fui, ego eris" - Virgil
Would it be possible to use all that computing power to make an electronic voting machine that works?
Oh wait! How about a voting machine based on "quantum computing"! Then we wouldn't even have to vote, the machine would already know who won.
Goddamn liberal qubits! Bunch of flip-floppers!
Stupid conservative qubits! They think that there is ONE and ONLY ONE answer for everything!
Based on the size of useful data GRID collected from 5,000+ machines and the quantity of pornography on my computer, they are claiming that: porn != useful.
...GRID computing; you disappoint me.
In an amazing breakthrough which will no doubt have profound implications on Moore's Law, it has been discovered that multiple computers can accomplish in a shorter time what would take much longer on a single computer! Researchers will next launch a study to see how much faster 6000 video ipods working simultaneously can play through all the songs on the iTMS compared to a single first generation ipod shuffle.
One time I threw a brick at a duck.
Perhaps if enough PCs were put to the task they could create new Shakespearean masterpieces. 64bit Night and all that.
To do something right, you often have to roll up your sleeves and get busy.
I don't know - the Mozix cluster at work achieves about the same efficiency... I.T. won't listen that there is something incredibly wrong as I employed as a statistician so obviously know nothing about computers!
:(){
Are there any other 'big picture' problems out there you think would benefit from the grid approach?
Using distributed computing to find molecular cures or the shape a protein will fold into as it comes out of the business end of a ribosome is all well and good, but if you can find a shape on the political map that concentrates all left-wingers into little ghetto-districts and gives solid 55% majorities to right-wingers across all the other ones, and you can deliver this IT solution around the time of the 2010 reapportionment, the Bush Administration has a no-bid contract with your name on it.
Up to a full megawatt or more for sixteen weeks. How much does that cost where you live? Still, it's a great bang for the buck. So how long would it take with a beowolf cluster of these?
What?
4/20 is the birthday of Adolf Hitler. Therefore, this is very evil.
Plus, 4+20-1 is 23. ((4*20)-11)/3 is also 23. More proof of it's evilness.
I'll take a Gin and Tonic (with Quinine, of course).
out there?
Yes. For example, how did a carpetbagging bitch who murdered her accountant get to be the senator of New York and the leading Democratic presidential candidate?
(Preface - My research group specializes in parallel computing) There are classes of problems so computationally intensive that the computers that can do them in a reasonable amount of time won't be invented for decades. Almost all of these are simulations of physical reactions (invitro drug simulation, climate simulation, biomolecular engineering sims, physics sims, 'etc). As a general rule, these problems scale weakly (meaning that as you add more computers, you can simulate more datapoints, and get more accurate results). If memory serves, the hardest problem I can recall involved hydrogen fusion simulations, requiring computers 10-1000 times faster than the best in the world today.
To make laws that man cannot, and will not obey, serves to bring all law into contempt.
--E.C. Stanton
http://www.majestic12.co.uk/
I think that a lot of the world's problems could be solved like this; the downside of course being that some guy in an island would have to sit underground and insert 4 8 15 16 32 43 over and over. (prays someone who kept up with the lost experience read that and understood) ha
That 420 was based on some benchmark. Perhaps a 1GHz Pentium or something. Perhaps the average CPU on the grid was higher.
I wonder if I use bold in my signature, people will notice my posts.
You know, I think the thing that aggravates me the most is that these distributed computing systems are helping drug companies find cures to illnesses using OUR processing power and computers WE paid for, only to sell us the drug that they would have been hard pressed to develop without our hardware back to us at an extremely inflated price.
Working on generic PCs using idle CPU, that's probably pretty good, right? These aren't dedicated grid computers as far as I can tell.
'Yes, firefox is indeed greater than women. Can women block pops up for you? No. Can Firefox show you naked women? Yes.'
You're glossing over some important points.
1) I'm pretty sure that the servers have to send the same job out to multiple clients. That is, you can't assume that it's sufficient to have only one computer return a result for one job. There's the possibility that the result is incorrect or never returned.
2) The point of grid computing is to reduce both the cost and time required to do the computation. The entire endeavor would be more efficient if you had full control over the entire grid, i.e. a huge cluster. The entire endeavor would also be more expensive. (Last I heard, Sun is having a hard time finding customers for it's grid computing service. It might just be that $1/CPU-hour is too expensive.)
3) Distributed computing of this sort depends on unused CPU cycles. You can't expect 100% CPU utilization out of all 5,000 machines that took part in the project.
So, what you comparing this to arrive at your cynical conclusion?
Did you consider the difference in cost?
The development of models to find relationships among individuals based upon their phone records, email communications, webpage preferences and other easily recorded and identified identifying tidbits of digital transactional receipts. Of course, I'm sure that there are various three letter agencies already well ahead of me on that one. (High guys!)
Worse...
It's over 4 months, not a fraction of a second.
If I have a task that takes 100 seconds to run and I want it completed in under a second, scalability becomes a challenge... I have to figure out how to break it in to at least 100 distinct parts and deal with all of the communication lags associated. To have any kind of fault tolerance, I probably want to break it in to at least 1,000 tasks so that if one processor is running fast, it can get fed more and if one processor corrupts its process, I don't find out right at the end of the second, with no room to compensate, that I have to run re-run that full second's worth of processing elsewhere to make up for it. That's where the challenge comes in.
If I have a task that takes 100 seconds to run and all I'm trying to do is run it a lot of times over a period of time that's many times greater, I can run it 864 times a day per system with absolutely no scalability issues whatsoever and simply send the relatively small complete result sets back. With 100 systems, if each one can run a distinct task from start to finish, I'd be expecting pretty much dead on 100 times the total number crunching as there are absolutely no issues with task division, synchronization or network lag.
In this case, they ran 5,000 computers over 4 months. Assuming a single task is solvable in under 4 months by a single system, they should have had no difficult task division problems to solve, absolutely minimal synchronization issues and next to no lag issues to address. In short, even a pretty inefficient programmer should be able to approach 1:1 scalability in that easy of a scenario.
Efficiency of algorithms is a challenge when you want a single result fast. When you want many results and are prepared to wait so long as you're getting very many of them, that's an incredibly easy distributed computing problem.
I've been donating my processor time for quite awhile now for the Malaria research, and even though the drug companies will probably benefit from my donation, they would not have these breakthroughs if people didn't donate that time, and it is the fact that a breakthrough will be found is what keeps me donating my processor time. It's a great feeling knowing that I've contributed to a possible cure towards this disease! Other projects that could need the services of Grid Computing, I believe that was the original question that was put forth, are imaging analysis (any field), physics (particle research, etc), and I can also see Grid Computing being used also for computer animations where the time to render animations would be greatly reduced, and allowing movies, and shows to be released much faster than before. (With this application, it would be known that you are contributing to a product that a company will be making a profit, and the only reason to do it is get these movies, shows to market faster. I, for one, would love to see a sequel to The Incredibles, and to be a part of that would be fantastic, even to just have my name mentioned in the credits!) One thing that needs to be done for these projects to get the maximum exposure for Grid Computing is to dumb down the process. A Noob would be hard pressed to set up Boinc Manager to do the Malaria research.
Much of this discussion is totally misdirected because the writers are confusing a distributed computing project like SETI or BOINC - http://en.wikipedia.org/wiki/BOINC_client-server_t echnology - with a grid system - http://en.wikipedia.org/wiki/Grid_computing. They are completely different things.
Gus Gorman has a patant on this already called the Ultimate Computer, I believe it's in the Grand Canyon.
It did 4 months worth of computation in 4 months. If it had been 420 years worth of computation it would have taken 420 years. It's like infomercials that say "and you get all this, a $899 value, for $30!" Obviously it's not "a $899 value" or you would be selling it for that instead of $30. Perhaps, though, they mean that they did 420 processor-years of work over the course of 4 months (meaning that they would have had an average of 1260 cores doing something useful at any time).
or just 420 years worth of junk data ?
I imagined a beowulf cluster of those, nekked and petrified. Then I got ashamed of myself for rehashing the old meme and dumped hot grits in my pants. As I was convulsing on the ground, there was only one thought left in my mind:
"Does it run Linux?"
sorry... nothing to contribute today but my name {thats what alcohol does... if i was (just) stoned, i would have a few pages to contribute}....damn alcohol....
have you seen my sig? there are many others like it but none that are the same
Grid computing is ill-defined, but it is about distributed processing, as is BOINC. BOINC (which now powers SETI@home) is a potential component or subset of what is now considered to be Grid. Some similar systems preexisted the current work on Grid, e.g. Condor, but then the Grid concept was first referenced at the end of the 1960s. Grid now adds concepts such as services (again not a new idea), and also workflows (again not new) and also work on data distribution.
The work that was done under WISDOM was unlike SETI@home in that it was distributing work to clusters (e.g. GridPP machines) rather than individual machines using gLite from EGEE, with local job managers then taking the work to the individual machines. SETI@home uses BOINC as the end-to-end work distribution system. Reading between the lines the Storage Request Broker (SRB) was probably used to control data access.
So actually it is just 4 months of data with the new standard they set.
This may be a bit obvious to all you /.rs ... but what the hell...
http://setiathome.berkeley.edu/
SETI@home is a scientific experiment that uses Internet-connected computers in the Search for Extraterrestrial Intelligence (SETI). You can participate by running a free program that downloads and analyzes radio telescope data.
*BSD is dead. reaper Nor do tyhe
If the headline were "NEW MALARIA DRUGS FOUND WITH AID OF GRID COMPUTING" I would be much more impressed.
It's all well and good to tie a big grid to a problem, but if you don't ask the right questions, you won't get useful answers.
Are there any significant grid computing success stories?
-pvh
"The wise man proportions his belief to the evidence." -- David Hume
Your work computer can be managed by the company you work at; they can even revoke root if they're concerned about security. There are actually a few existing distributed filesystems for Linux, though most of them suck, and the few I've seen with the potential not to suck either cost money or are a long way from being stable on Linux. Haven't seen ANY of these on Windows.
Someone mentioned backup, which isn't a big deal. Ever heard of RAID? Yeah, it could be something like that.
Although if it's a desktop PC, the 40 gigs probably isn't worth the power required to keep your computer on, and they're probably better with insanely aggressive local stores to speed up disk access. But again, these kind of suck for Linux.
The closest thing to a solid, clean design that I can find for Linux network filesystems is NFS, and that doesn't have any of the features we're talking about -- not without combining it with one of these other kludges...
Don't thank God, thank a doctor!
Are there any other 'big picture' problems out there you think would benefit from the grid approach?
I can think of two:
this
...and this.
Which is, according to all evidence and experience, a most... intriguing problem to a male consciousness. Now that there is the technology available, we should _FINALLY_ take advantage of it and find out about the secret inner workings of this giant enigma! Go grid computing! - Michael
Why should I volunteer my time and money to subsidize some company when they will make millions(billions) of dollars a year?
Do you volunteer for google? Microsoft? or Target for 4 month without pay?
My tax payers money already go to NIH which subsidizes all the drug development research in the world. What do I get out of it? 40 dollars per tablet medicine?
Fuck this. They want my work, they will pay me. They want 5000 computers, I will set up the servers, they will pay my workers and my company money. End of the story. This is no fucking communism where all the workers work for peanuts and scum bags daughters fucking fly on super sonic jets from moscow to paris so she can fucking buy perfume.
Fuck this project, fuck it with a blunt spoon.
It's called information assurance. There are reasons that a Netapp/EMC array costs $25,000 per terabyte when a 1TB maxtor usb drive costs less than $1000. The first is that it is made to tolerate faults and be redundant. Sure you could do this in an enterprise, but then you end up with massive duplication to get around people turning off their computers, a massively expensive and complex distribution and tracking system, and higher failure and lower performance of desktop drives that are now running your webserver and processing an access DB for the local user. The second is speed. Data IO has always been my limiting factor. Working with any kind of media, large databases, etc, speed is king. Not only are desktop drives single channel and slower (RPMs), but they are now seperated over a network that may already be flooded. Information assurance isn't cheap, but it's worth it.
People who think they know everything really piss off those of us that actually do.
Kinda like the what the mice were doing, before earth was destroyed by the vogons...
"In this case, they ran 5,000 computers over 4 months. Assuming a single task is solvable in under 4 months by a single system, they should have had no difficult task division problems to solve, absolutely minimal synchronization issues and next to no lag issues to address. In short, even a pretty inefficient programmer should be able to approach 1:1 scalability in that easy of a scenario."
Here you are confusing CPU time with wall clock time. The task may have taken 4 months of wall clock time, but what proportion of those machines' time was running the WISDOM tasks? Given that the UK machines were from the GridPP project I would guess that the machines were only running the jobs part of the time, and in fact the article notes that this was the case. So the scalability issue (or this part of it) is a red herring as what the analysis actually showed was that using CPU resources that would otherwise have gone unused led to the job being run in 4 months (granted the electricity cost is not zero, though). Actually given how heavily loaded many clusters are in the UK I am surprised it didn't take longer than 4 months (as that implies only 75% cluster load from other jobs, whereas 90% is often more typical).
the question is, were they 5,000 GRID computers?? like these? heheh
*plays the Apogee theme song music*
No, no, no - you are talking about 4 months when using CPU resources not otherwise being used by other things on clusters with another primary purpose. What you are complaining about is like admonishing someone who says they spent 72 hours building a model kit for not having started it on Monday and finished the the last moment of Wednesday.
420 years huh.... niiiiiiiice
Mars global orbiter has sent back high resolution pictures of the surface that would take scientists a few years to scan to find definitive signs of water on Mars. This would be a perfect candidate for such a system.
no one's ever done anything like this before.
this really is news!
Malaria would be a forgotten disease if the ecopagans hadn't outlawed DDT.
Tens of millions of human beings [typically brown & black, and suffering in the most politically correct of third world cesspools] die every year because of our arrogant and narcissistic obsession with this pagan religion.
There have been lots of responses already, but I would like to add another...
There seems to be a widespread fallacy that all human resources should be applied to the One Biggest Problem facing humanity at any given moment. Overlooking for a moment the obvious problems inherent in trying to choose the One Biggest Problem, and assuming we could actually rank all human problems in a well-defined order, there are still two huge problems with this approach:
1. Diminishing returns. Putting twice as many people on a problem doesn't solve it twice as quickly. The extra people could well be more productive working on a separate problem. This is the well-known fallacy of the Mythical Man-Month.
2. Misplaced priorities. The majority of people in the world do not have cancer. If all the resources of humanity were spent on cancer, where would that leave the rest of us that don't have cancer? "Sorry, we've stopped making antibiotics, insulin, toothpaste, books, and clothing so we can focus on fighting cancer."
In addition, there's an implicit assumption in the parent poster's position that the researchers who are looking for a cure for malaria have been wasting their time. I'd like to ask, what has *he* been doing during this time? I hope he has been looking for a cancer cure, or else he's nothing but a hypocrite.
Patrick Doyle
I mod down every jackass who puts his moderation policy in his sig. Oh, wait a sec....
It's worth mentioning that malaria was nearly wiped out by the simple and inexpensive use of DDT before Rachel Carson and her sympathizers managed to get the stuff banned. And 35 years later, pretty much all her arguments have been shown to have been fabricated. But hey, only 30 million+ died as a result.
It's nice to know that grid computing can be used to evaluate the potential of all those compounds, of course, as there are certainly applications for that. But the context of the current test is one that we should be ashamed is necessary.
--- Bill
...is "Grid computing finds cure for malaria."
I could look through the threads of my bedroom rug for 420 years and not find the cure either.
Eyes on the prize, people.
Since the work took 4 months, it implies that each machine was used for about 25% of capacity. Now, the big point would be that they got the work done for free other than overhead costs (creating the grid software and networking, etc.). Which ain't bad.
The lessons of history teach us - if they teach us anything - that nobody learns the lessons that history teaches us.
Can Entropy be reversed?
The only thing new in this world is the history that you don't know.[Harry Truman]
Are there any other 'big picture' problems out there you think would benefit from the grid approach?
;p
How about "Global warming: Are humans affecting the enviroment on Earth"?
I don't think 5k computers can solve it, better increase the numbers tenfold or more.
Carbon based humanoid in training.
With a botnet of a few hundred thousand machines, brute-forcing the crypto application of your choice would immediately come to mind. Whether that would be one of the better uses of the botnet is questionable, but hey, if you have something that's really important to you to try to crack...
steve
Oh, you're not stuck, you're just unable to let go of the onion rings.
Another grid computing project is the World Community Grid. Members have contributed over 75,000 CPU years to several projects. See the http://www.worldcommunitygrid.org/.
The last time I checked there were over 260,000 members. Over 100,000 have joined a team. There is a slashdotusers team (one of the larger teams) as well as the one I am in UserFriendly.Org.
"...Are there any other 'big picture' problems out there you think would benefit from the grid approach?..."
One of today's greatest problems facing all humanity is Gravity. Use the Grid to solve Anti-Gravity.
Nope, FOlding@home and the article are talking about he same thing.
:)
We do over 420 years of compute time ever day tho
- Adam L. Beberg - The Cosm Project - http://www.mithral.com/
This sort of stuff is part of my job. Full disclosure: I am a molecular modeler at a large pharmaceutical company. People always want this method to work, and it seldom does. Why? The models are too primitive, automated data analysis will miss true hits, signal to noise ratio is very high. First of all, what they are looking for is properly termed a hit, not a drug. The odds of a drug (i.e. the final chemical entity that gets FDA approval) coming out of this type of screen is beyond astronomic.
Secondly you need to understand what they're actually doing. They have a rigid model of a receptor/enzyme/protein that is relevant to malaria. Then they dock what's called a library of compounds into that receptor and compute a score based on various interactions and/or properties. Almost surely explicit solvent is not taken into account, amongs a host of other simplifications. Probably then they will have to use some kind of cut off. What will that be? Well it depends on the actual number of compounds they can test in a real assay. If it was this easy, don't you think there'd be way more drugs out there, and development wouldn't cost millions of dollars? Trust me, I see these kinds of kooky proposals all the time, and while I wish them the best, I don't hold out too much hope.
An experiment I've done myself: compare the results of an experimental high throughput screen on 1.5M compounds against a virtual screen (very much analogous to what was run here). Result: 1% overlap of the top 20% of compounds from either screen. Shouldn't the model be able to reproduce reality better?
This kind of method can work in limited cases and with much smaller numbers of compounds. Brute force rarely does.
Brain: "Pinky, tonight we will take over the world!"
That sounds like a good one!
Vote monkeys into Congress. They are cheaper and more trustworthy.
420 years of computing.. hmm I wonder where they pulled that number from....
What drugs did they say they where "researching?"
That much computing power could drastically cut the amount of time needed to brute force private keys, so why not leverage it to break the DRM keys used to encrypt the new music/videos/etc?
Grid computing, or other technologies which provide for synergies, may help cure maladies such as the common cold, HIV, global warming,
cancer, etc. Why not? Better hurry, as bush will be bombing Tehran by the end of April....
I stand behind my assertion that the project described in the article about "420 years of computing" is not running on the same kind of system that folding at home is running on. The two wikipedia links are fairly clear and the system described in the original article is not the same as that used by folding at home. Gil
There's been a lot of talk that the "WWW" may become the "GGG", the Great Global Grid, and I think it might actually happen. The network will absorb not just the information, but the computing power too...