Visions Of The Future Of Grid Computing
CaptianGrid writes "Computing grids, or software engines that pool together and manage resources from isolated systems to form a new type of low-cost supercomputer, have finally come of age. BetaNews sat down with some of the world's leading grid gurus to discuss the significance of such distributed technologies and separate grid hype from grid reality."
if i had a grid computer, maybe i would've been able to get first post.
Then more recently we have seen Univa being created, which I am involved as founder and advisor
Univa
Univac - a successor of Multivac, the largest computer in Asimovs world.
Nerds - they get everywhere.
I guess its time that the power of a single CPU (Ghz and instruction per clock) are leveling off, and this seems like the only way to increase computing power, hook lots of it together. Hopefully we will be able to find some answers from the SETI or cure some nice cancer for the Folding projects. Would be nice if the commertial grids also help out those projects by giving them their spare cycles. GRIDS CRUSH SINGLE CPU.
Imagine!
Check out Apple's X-Grid technology!
It runs on any OSX system, 10.2.8 and up. Put your spare cycles to work.
Xgrid: High Performance Computing for the Rest of Us
Never ask for directions from a two-headed tourist! -Big Bird
The article mentions the commoditization of grid computing by adhering to a set of standards, but past a certain point, it makes little sense for IBM or Sun to make their tools interoperable... that makes their consulting value-add on top of grid resources they offer diminish.
I think that for full standards compliance, you'll need to look to companies which don't offer their own computing resources -- platform-agnostic companies. But then who do you buy the compute resources from? Unless you're buying your own systems for use (which makes "utility computing" less viable), it's a bit of a catch-22.
500GB of disk, 5TB of transfer, $5.95/mo
If you want examples of operating systems that help with gridding, check out Plan 9 from Bell Labs and it's sister project Inferno. Nice thing about Inferno is that it runs on Linux, Windows, Mac OS X, Plan 9, and on native hardware.
Free of Flash! Free of Flash!
There will never be a substitute for a single box with a lot of CPUs on it. For tightly coupled dataset the latency of a grid will be a limitation.
Transcend Humanity. Please.
Computing grids, or software engines that pool together and manage resources?
Pure Bolshevism, that's what!
When all the switching and installation are done from cheap-labor countries, a lot more techies will be out of work.
Table-ized A.I.
What we provide is primarily an implementation of Web services standards to allow people to build services, and the primary goal is also for us to provide a set of pre-defined services that allow you to use Web services protocols to interact to request the allocation of compute resources, the creation of computational services and moving the data from one place to another and so forth.
Does this sound like Carly Fiorina attempting to explain HP's strategy to anyone else?
The new generation of marketeers use Grid, but they rarely are refering to what computer science engineers refer to grid clustering. I think the marketeers talk about Grid when they really mean virtual Operating Systems running on abstracted hardware platforms: either a mainframe, or otherwise kick-ass multi-way system that has been virtually partioned, or something like vmware piecing together several x86 style servers.
Frankly, I don't like the word Grid being applied in this way. However, the latter technology is facinating (virtual OS) and will come to dominate computing in the next few years.
The basic idea is total abstraction of the application/service from hardware/location. The app gets the resources it needs, can be cloned/replicated to another location for distaster tolerance, and can scale and grow on demand based on needs by simply throwing more hardware modules at it. It's not just limited to computing but also applied to storage and network.
Someone you trust is one of us.
Look, the bottom line is there is nothing new here, just new sets of buzzwords. You have been able to submit massive computer jobs to IBM or Sun (with their insane $1/cpu-hour), or even most college campuses (the U of Minnesota had such systems) for the last 35 years. MPI/PVM standardized and commoditized the clustering side of things long ago.
;)
Globus is now "web services" and not "GRID". GRID is so last century. It's far more cool now that it's in Java too. Anyone still working on GRIDs should search/replace immediately!!!
And did they drop the name of every single business partner they have in that article, or did only I notice that?
- Adam L. Beberg - The Cosm Project - http://www.mithral.com/
Hasn't the blackout taught us to move away from GRID type setups? If people just created their own power the blackout would have affected us less. Could this principle not be used for home computing? Rely on yourself and not on others?
Live forever, or die trying.
While RTFA, I couldn't help but wondering what the overhead of a Web service-based grid solution might be and how the overhead would get compounded by the frequent communication among the grid nodes.
Tyranny isn't the worst enemy of a democracy. Cynicism is.
I have started to look into having some of my cheaper machines grid together to be a nice cluster, though I haven't found a solution to something I thought would be necessary for this kind of environment... Thread Migration.
Sure, it may be much harder then migrating a whole process, but too often spawning whole processes is simply not the answer to SMP programming.
Thus far, I have looked at Mosix/OpenMosix and OpenSSI, and both fail here. Can anyone give me some insight perhaps? Maybe I am missing something.
"When life gives you lemons, don't make lemonade. Make life take the lemons back!" -- Cave Johnson
Just do all your computation in whatever hemisphere is in winter. They can use the heat.
Only problem with this kind of setup is in fact it's limited ability to accomplish anything usefull to a consumer or a medium company. While, of course it is an interesting field, and one that needs to be researched, technology like proximity computing (SUN) is what will dictate the technology in the future. It's hard as it is to even get decent multiprocessor scheduling without too much overhead on a single pc, overhead incurred with grids would be enormous (I guess that's why the primary applications would be file storage etc.) Proximity computing on the other hand, is an innovative approach that doesnt try to solve a problem in place, but avoids it all together.
What I want to know, is there anyway to sell my unused cycles on the open market. I love SETI and all, but making a $$$ would be super cool.
San Francisco Photographers
Is a combinataion of grid and virtualisation.
... same thing, ldap server same thing. If a server gets under load, it will automatically devote more memory/space/cpu/bandwidth to it as reasonable.
Grid in the sense that if my datacenter needs more resources, I just plug in a blamk PC with extra CPU/MEM/Disk and not worry about it. Or if one goes bad, I just rip it out without worring about what it will destroy.
Virtualisation in sanse that if I need an email server - I just create a virtual one on this grid and let it go, if I need a DNS server - I just create one on this grid and let it go, a web server
That is my idea of a true grid.
Grids are great for non-time critical computations tasks. But what happens when everyone needs cycles now! My guess is that systems will evolve to give cycles to the highest bidder/highest priority. In such an environment, low-priority tasks will become effectively impossible on a grid - there will always be some higher-priority/higher-paying task that usurps the cycles.
I wonder how long SETI@home will last if home PC users realize they can "sell" cycles to meet for-pay demand for computational power.
Two wrongs don't make a right, but three lefts do.
If you're lucky, you may just get last post ...
Ooops, too early. Never mind.
If you've got a problem that's trivially parallelizable, then sure grid computing is great! RC5, seti@home, and similar projects can benefit from grid computing (really, that's what grid computing is -- someone else's code able to run on your machine when it's idle and do work).
However, don't even begin to think you'll be solving anything that requires any sort of processor to processor communication. Rocket simulation (our local favorite example here at UIUC) for instance is heavily communication based.
The linpack benchmark that top500 uses also needs a low-latency interconnect to perform really well, so don't expect to see "the grid" sitting up at the #1 supercomputer slot on top500.org anytime soon (or really, ever, unless someone develops FTL networking). Latency on the internet in general (and specifically around the world thru all those switches and latest_slashdot_hot_chick_movie.torrent packets) is nothing near what a supercomputer needs.
Now, there are research groups looking at ways of making communiation delays less of a problem, including the one I was in while I was in grad school. There's a number of ways to do it, but none of them I've seen are going to take on worldwide-network-latency and survive with their performance intact.
Even something as "simple" as chess wants to have a fast interconnect - every node that's gotten stranded working on low-priority (bad move) work is a wasted node you may as well not have.
Slashdot Patriotism: We Support our Dupes!
Am I the only who saw this and thought that GRID laptops were coming back? I loved the old GRID laptop I had, I swear you could drive your car over it and it would still work. I wonder what happened to that company?
I Am My Own Worst Enemy
Er-lang.
The guy in TFA talks about P2P being another type of grid and that a family could create a distibuted environment for shared data. He also talked about trust.
My idea is that with adding strong encryption you get basically small priate network that is almost impossible to crack. DVDs + CDs + Encrypted P2P among a small group of people == Old Skool Sneakernet (aka borrowing your friend's stuff). You and your friends can share all the entertainment among yourselves as you like. All you need is a P2P-type client and share your keys with your friends physically (as in 3 1/2 floppy exchanges).
You want to borrow that new Spider-man 2 DVD but are too lazy to get go over to your friend's place to get it? Send him an email and ask him to rip it to Divx and throw it up on your private encrypted P2P network.
Mod parent up as insightful!
Somebody please analyze what the malware world is doing, and share it with the grid computing gurus. The technology can't be THAT different, can it?
Why, oh why, didn't I take the Blue Pill?
You're adding processing latency to storage latency. But you won't find maximum theoretical latency quoted in their grid rentals.
Transcend Humanity. Please.
And when I was a kid, we'd talk about how cool it would be in the year 2000, when everyone would have flying cars and monorails and rocket ships to get around in. Wake me up when any of this actually makes a difference, OK?
Sigs? Sigs? We don't need no steenkin' sigs.
Take a pinch of Standard Linux
Wrap it up in Xen
Add a touch of SELinux
And a little bitty bit of Globus
Oh like a Sandboxed Platform
Oh Lordy, Lordy, mixed with Free and Open Source Code
You know you lump it all together
And you got a recipe for a Multi Vendor Development scene
It is coming though, you know, you know.
What we have is a great big melting pot
Big enough enough enough to take every vendor and all IT's got
And keep it stirring for a hundred years or more
And turn out Application Service and Content Providers by the score.
With apologies to Blue Mink .
"Grid" is all about "You let me use your spare cycles, and I'll pretend I'm going to let you use my spare cycles in return."
"But all your emitter and collector are belong to me!"
"Grid" technology to do this stuff has been around for decades e.g. NQS, hell NASA gave away PBS in the 80s & 90s.
The problem is that most of the CPUs out there run Windows, which is currently damned near useless for this kind of thing. It'll require a rewrite of the OS to take proper advantage of the potential of a network of windows boxes for general purpose computing. OTOH, a couple of shell scripts and SGE (http://gridengine.sunsource.net/) does the job on Linux and other Unix systems.
Government of the people, by corporate executives, for corporate profits.
Because I'm feeling contrarian today, I'll call you on your prediction. While virtualization technology might seem new and hip to some, in computer terms, it is an ancient technology, older, in fact, then the operating system itself. Early computers were developed using virtualization of hardware and IBM ran all of their systems on top of a firmware, which virtualized the environment that the operating system runs on. Higher up in the system, one of IBM's first modern operating systems was VM, which was a virtual machine operating system that could run itself in one of its virtual machines as well as other operating systems. It's true that virtual architecture provides an unparallel level of abstraction, but at the price of performance and administration.
The mainframe of old was in many ways a perfectly virtualized system. Units of computation, whether they be transaction, interactive or batch could be purchased in the exact quantity needed and the computational resources, with some constraints, could be located anywhere geographical and managed by anyone. This suited many of the computing needs of some very large enterprises, but even in its heyday it was not completely dominate. It was inflexible, because of the centralization necessary to administer it and the high level of abstraction. And it was expensive, due both to the complexity and the performance sucking power of virtualization technologies. In part, It was these twin factors that lead to the PC revolution. While enterprise computing will continue to evolve. It's techniques will never (again) become 'dominate' in the way it was in the in the late 60s through the early 80s, and virtualization will continue to be a tool, but will never be the standard way in wich most application are run or most people interact with computers.
Quote TFA:
Sun Microsystems recently unveiled a new grid computing offering that promises to make purchasing computer time over a network as easy as buying electricity and water.
That sounds very much to me like Sun's another try to warp the world back to the "classical" server/dumb-terminal era.
Tyranny isn't the worst enemy of a democracy. Cynicism is.
Actually. Mod it down as ignorant. 20 year old grid technologies (AKA Network Queueing Systems) have already solved that problem. You can define policies on CPU/memory/disk/etc/etc hogs.
Government of the people, by corporate executives, for corporate profits.
Me neither, but for slightly different reasons.
The main definition of a grid is a pattern of intersecting lines. While sun or ibm may arrange their computers neatly in rows of vertical racks and build it in a grid pattern physically, nothing of this remains for the actual use or architecture of so-called grid computing. This leaves large swaths of parallel algorithms by the wayside. The only things you can efficiently compute on a grid are the "embarassingly parallel" codes that don't interact much with neighboring CPUs nor require large data sets. Sure, you can do SETI work units and compute large primes, but for chess, weather, and crash sims you'd be better off with a traditional supercomputer or local cluster.
Wiki reference here
Yes, yes you are. And, why would I give a shit anyway?
What no one is mentioning is that these big cluster/grids that Sun/IBM are building to later sell over the network are dependant on the ratio between network speed and batch file sizes.
EXAMPLE: IBM is currently offering CPU/Hour service in Houston to oil and gas companies. Sounds great till you realize the multi-terabyte files that consume such a massive compute service are too big to be readily sent over the network. Instead they use vans to haul tape and disk over to IBM and then run the process on it.
What is the bandwith of a station wagon? Right now its faster than the internet on a 20 mile drive across Houston.
But even take it a step further and the ratio remains. What if I wanted to pay Sun/hr for CPUs while I worked on a big Maya render of 200 gigs. By the time I've sent that over cable modem have I gained a ton in performance time?
The problem I see is that we are making CPU massively parrallel but not networks. So will it EVER make sense to send a massive file to a commercial grid over a singular network connection.
Somone should do the math.
Didn't we have a whole rivalry a century and four score ago that taught some of us that?
...but seriously, yes there would definitely be massive comm-overhead involved, not to mention the overhead and cost of validating the data to make sure it's an actual result and not a "needle in the haystack" that would hurt or even destroy the precision of the results. Take SETI@home for example.
You can hold down the "B" button for continuous firing.
The use of the word "grid" here is in the sense of an electric power grid. The idea is that you should get computing power on demand, just like you get electic power on demand.
Sun has been experimenting with EBay for quite a while now. It would be pretty neat if they could figure out a way to auction off chunks of their grid on some sort of how-much-and-how-soon basis, like you say. If a movie company or fluids dynamics contractor needs the whole thing yesterday, they would be willing to pay a premium for not having to make a grid of their own and get a few thousand CPUs _right_now_.
-- Microsoft is the most expensive commodity operating system and office suite vendor in the marketplace.
I will probably look back on this in 5 years and be like, "That was a moronic thing to say". But I think it would be an intresting concept if the whole network would share cpu cycles. Kind of a communist state of process, especially if storage was distributed as well, everything running asynch in one giant mass of processing power. The whole internet hundreds of millions of cpu's running ASMP, Crypto would be dead.. Sounds like fun to me.
Like I said though I will probably look back and call myself a moron, I mean I used to support redhat... lol.
"A learning experience is one of those things that says, 'You know that thing you just did? Don't do that.'" - DNA
Now that I think of it, this sounds a lot like airline ticket pricing. Cheaper three weeks out, getting more expensive up to the day of the flight, but getting really really cheap just before takeoff (e.g., Priceline.com). The difference is that CPUs don't take off, so the price dip at the end wouldn't happen (Sun could just turn off the servers if they really want to).
-- Microsoft is the most expensive commodity operating system and office suite vendor in the marketplace.
Would not quantum networking allow instant transfer of information no matter the distance between systems? Or am I mixing fantasy and reality again?
Usually that only happens in bed.. And I get smacked for it!
"A learning experience is one of those things that says, 'You know that thing you just did? Don't do that.'" - DNA
I completely agree. Nothing is really new. It was all invented at DEC eons ago...
Virtualization alone is just another layer to manage. There's no point in hiding a technology if you are replacing it with an equally complex and less efficient one.
You are correct that virutalization adds expensive overhead, and it can be complex and inflexible.
I believe the solution resides where the costs of downtime associated with direct associations with hardware outweigh the costs of virtualization. Virtual Memory systems are a classic example. As you know, vm has been around since the early 70s, but really wasn't widely adopted in PCs until the 90s. That's becuase the costs of virtual memory were too high. Also, increased computing power helped bring the costs inline. Once the benifit justified the cost: bingo.
The other piece is automation. Virtualization has to be as easy as say vm systems. As an end-user, I don't have to worry about virtual memory. Worst case, I may be involved in sizing the swap space. I don't even have to worry about virtual memory as a developer (for user space apps). All is taken care of by the kernel and compiler. This same simplicity has to be achived with virtual systems - it never was with mainframes.
Someone you trust is one of us.
The reason for such an arrangement is that high-speed interconnects are expensive. Building a single cluster that is uniformly very high performance would be horrible for anyone other than a very rich organization to consider.
On the other hand, grids alone are way too slow to handle the needs of time-critical communication, which is what you have a lot of the time in parallel computing.
A hybrid, able to place components of a problem according to that component's needs, would seem to be the logical solution. It is also the scalable solution. Clusters often have an upper limit in size. By having grids of clusters, you have a virtually infinite capacity. True, there simply aren't any clusters that have reached the upper limit. Yet. But it's getting tough at the size they are at right now.
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
I did my dissertation on Grid computing a year ago or so. My group predicted utility computing to be just like a power utility grid. You will plug in a device, and you will be charged certain amount of money per MIPS or some other form of measurment of computing power.
A Beowulf cluster of computer grids.
Signature.
Condor was made many years before grid computing, it is just a bad comparison for people who doesn't know how to explain grid computing.
"Deployment of deliverables on-cycle and on-quota"
I HATE management-ese. It's nothing but BS. "...deployment in the science space," "...focus on vertical particulars, financial services for example..."
If you need to generate obtuse buzzwords to justify your job, I need to generate ways and means of deploying you at the unemployment line. This is worse, because the guy seems to come from academia, not industry.
For internet scaled distributed computing, there is an online "Hello World" code demonstrating the Grid Computing concept. http://grinix.sourceforge.net/ Grinix is based on the Globus Toolkit, Linux and Java.
For parallel computing in the scale of a local network, there is an open source Live CD project called "Thinux (Cluster Live)" http://thinux.sourceforge.net/thingindex.html. Cluster Live allows a cluster of networked diskless thin clients to boot up and start an application, such as the web browser, as well as allowing computing resources to be shared with each other.
Because I'm feeling contrarian, too, I'll call you on your claims. Virtualization can be very cheap, and very easy to administer. VM/370 was based on CP/CMS, which was developed using government money, so it was open source. In an early example of why open source is such a good idea, several big timesharing companies took CP/CMS and hacked CMS to get rid of the real I/O instructions (CCW's, or Channel Command Words) inside it. You see, CMS was a real single-user OS. So CMS could run on bare hardware, just like it could run under CP. Thus, CMS issued CCW's to talk to what CMS thought were "real" I/O processors on "real" hardware. Which meant that when CMS ran under CP in user (non-privileged) mode, every time the machine tripped over one of these CCW's, an illegal instruction trap was generated. The trap was caught by CP, which then parsed and painstakingly emulated the CCW in an extremely complex routine called "CCWTRANS." Many have lost their sanity reading the code to CCWTRANS. Anyway, although really cool, this strategy also turned out to be really expensive.
Meanwhile, because they all had the source code to CP/CMS, the timesharing companies all came up with the same basic great idea. They hacked CMS to get rid of the CCW's, and replaced the CCW's with the equivalent of fast BIOS traps into CP. So CP didn't have to translate or emulate anything any more, things began to run at native speed, and suddenly everything was lickety-split fast again. In fact, this hack sped up CMS to the point where the premier speed vendor, National CSS, could run 250 users with decent performance on a 370/168 mainframe. VM/370, meanwhile, topped out at a measly 60-70 users. IBM either never figured out the hack, or as is more probable, wasn't very interested in VM/370 anyway (their cash cow was and still is OS/MVT and its successors).
So you are correct; VM/370 was a dog. But CP/CMS, hacked with traps, was totally amazing. I was there; I was a CP system programmer; I know.
The modern equivalent of this strategy is called Xen. Xen has been a topic here before. I predict you will see a lot more about it in the future.
http://www.worldcommunitygrid.org/
Current Project:
Human Proteome Folding Project: A layperson's Explanation
Proteins are essential to living beings. Just about everything in the human body involves or is made out of proteins.
What are proteins?
Proteins are large molecules that are made of long chains of smaller molecules called amino acids. While there are only 20 different kinds of amino acids that make up all proteins, sometimes hundreds of them make up a single protein.
Adding to the complexity, proteins typically do not stay as long chains. As soon as the chain of amino acids is built, the chain folds and tangles up into a more compact and particular shape that lets it conduct specific and necessary functions within the human body.
Proteins fold because the different amino acids like to stick to each other following certain rules. Imagine that amino acids are pop-beads of 20 different colors. The pop-beads are sticky, but sticky in such a way that only certain combinations of colors can stick together. This makes the amino acid chains fold in a particular way that creates proteins that are useful to the human body. Human cells have mechanisms to help the proteins fold properly and, equally important, mechanisms to get rid of improperly folded proteins.
How do proteins relate to human genes?
The collection of all of the human genes is known as "the human genome." Depending on how the genes are counted, there are over 30,000 genes in the human genome. Each gene, which is a section of a long chain known as DNA, dictates how to build the chain of amino acids for one of the 30,000 proteins. In recent years, scientists were able to map the sequence for each human gene. This means that we now know the sequence of amino acids in all of the human proteins. Thus, the human genome is directly related to the "human proteome," the collection of all human proteins.
The protein mystery
While researchers have learned a great deal about the human proteome, the functions of most of the proteins remain a mystery. The genes do not reveal exactly how the proteins will fold into their final shape, which is critical because that determines what a protein can do and what other proteins it can connect to or interact with.
Proteins are like puzzle pieces. For example, muscle proteins connect to each other to form a muscle fiber. They join together in a specific manner because of their shape, as well as other factors relating to the shape.
Everything that goes on in cells and in the body is very specifically controlled by the shape of the proteins that do or do not let proteins interlock with other proteins. For example, the proteins of a virus or bacteria may have particular shapes that enable it to break through the cell membrane, allowing it to infect the cell.
The Human Proteome Folding Project
Knowing the shapes of proteins will help researchers understand how proteins perform their desired functions and also how diseases prevent proteins from doing their necessary functions to maintain healthy cells.
The Human Proteome Folding Project will combine the power of millions of computers in a grid to help scientists understand how human proteins fold. The work to be done in this monumental task is shared across this grid, so that results can be achieved far sooner than would be possible with conventional supercomputers. With a greater understanding of protein structure, scientists can learn how diseases work and ultimately find cures for them.
When your grid agent is running, it is folding an amino acid chain in various ways and evaluating how well each folding follows the specific rules of how specific amino acids stick together or not. As computers try millions of ways to fold the chains, they attempt to fold the protein in the same way that it actually folds in the human body. The best shapes identified for each protein are returned to the scientists for further study.
Understanding your agent application window
Click
Actually, Java would be perfect for managing a grid.
:)
If you think so, check it ibis. You can download the latest version from the link and play around with it. Or read this old slashdot story
disclaimer: I'm not directly involved with the project, but working in the same group as the developers. And I don't mind pushing a good idea
The first is at the low level - containers in which to run an operating system. This allows the system to be provisioned with the required OS that the user requires, no matter what the hardware layer. This allows the user more options of where their jobs might run rather than scouring the world for the one server that has the right OS, is cheap enough, and can have the job done by next Tuesday.
At the higher level there is the virtualisation of collections of resources into a computing infrastructure at which you can throw your job (along with some policies, such as what your job does, how much you are prepared to pay to have it done, and the fact you want it done by next Tuesday). Ultimately this virtualisation is into the 'Grid' - just one virtual computer that gets your stuff done.
There are a whole series of underlying tools and technolgies that make this possible, and the likes of Sun Grid Engine, Condor, Globus, Web services, BPEL, etc., are tools to construct such a grid, that all run on the lower level tools (TCP/IP and the like).
Anyone who has ever written a grid application or used the now dominant Globus toolkit (which has recently moved to web services!!!#%^&*) knows that the grid is nothing more than a mechanism for getting papers published.
If a 501-c3 put together a grid, then everyone donating computer time would get a tax deduction fort the time computing. The 501-c3 would donate time to charitable and educational uses...cure for aids etc. Get an appraiser to specify value for a minute of donated time so that you can deduct more than $500 worth. I think there are established values for supercomputing time. The first true example of earning money while you sleep.