Grid Computing at a Glance
An anonymous reader writes "Grid computing is the "next big thing," and this article's goal is to provide a "10,000-foot view" of key concepts. This article relates many Grid computing concepts to known quantities for developers, such as object-oriented programming, XML, and Web services. The author offers a reading list of white papers, articles, and books where you can find out more about Grid computing."
Gentoo Linux is an interesting new distribution with some great features. Unfortunately, it has attracted a large number of clueless wannabes and leprotards who absolutely MUST advocate Gentoo at every opportunity. Let's look at the language of these zealots, and find out what it really means...
"Gentoo makes me so much more productive."
"Although I can't use the box at the moment because it's compiling something, as it will be for the next five days, it gives me more time to check out the latest USE flags and potentially unstable optimisation settings."
"Gentoo is more in the spirit of open source!"
"Apart from Hello World in Pascal at school, I've never written a single program in my life or contributed to an open source project, yet staring at endless streams of GCC output whizzing by somehow helps me contribute to international freedom."
"I use Gentoo because it's more like the BSDs."
"Last month I tried to install FreeBSD on a well-supported machine, but the text-based installer scared me off. I've never used a BSD, but the guys on Slashdot say that it's l33t though, so surely I must be for using Gentoo."
"Heh, my system is soooo much faster after installing Gentoo." .debs can be rebuilt with a handful of commands (AND Red Hat
supplies i686 kernel and glibc packages), my box MUST be faster. It's nothing
to do with the fact that I've disabled all startup services and I'm running
BlackBox instead of GNOME or KDE."
"I've spent hours recompiling Fetchmail, X-Chat, gEdit and thousands of other programs which spend 99% of their time waiting for user input. Even though only the kernel and glibc make a significant difference with optimisations, and RPMs and
"...my Gentoo Linux workstation..."
"...my overclocked AMD eMachines box from PC World, and apart from the third-grade made-to-break components and dodgy fan..."
"You Red Hat guys must get sick of dependency hell..." .rpms together on the command line, and that problems
hardly ever occur if one uses proper Red Hat packages instead of mixing
SuSE, Mandrake and Joe's Linux packages together (which the system wasn't
designed for)."
"I'm too stupid to understand that circular dependencies can be resolved by specifying BOTH
"All the other distros are soooo out of date."
"Constantly upgrading to the latest bleeding-edge untested software makes me more productive. Never mind the extensive testing and patching that Debian and Red Hat perform on their packages; I've just emerged the latest GNOME beta snapshot and compiled with -O9 -fomit-instructions, and it only crashes once every few hours."
"Let's face it, Gentoo is the future."
"OK, so no serious business is going to even consider Gentoo in the near future, and even with proper support and QA in place, it'll still eat up far too much of a company's valuable time. But this guy I met on #animepr0n is now using it, so it must be growing!"
-
And with this change in computing comes another challenge. Not every company has applications that would benefit from distributed computing, but many do. The challenge is making a secure environment that will allow Company A to send their data *and* the software to process that data down the pipe to Company B for processing, meter the usage, and charge back the service. From what I have seen, no farm is really ever utilized 100% of the time, but there are crunch periods where something has to be simulated within a certain timeframe and the existing throughput on hand is not enough. It is those crunch times where you could really use a few trillion spare cycles.
"Beware of he who would deny you access to information, for in his heart, he dreams himself your master."
Grid computing is the "next big thing"
But I thought that this was the next "killer app"?
People couldn't type. We realized: Death would eventually take care of this.
I thought Social Software was the next big thing
Grid computing is not about making a giant computing farm out of a bunch of distributed machines.
...), and ideas are the important part here. having these "Grid" concepts built into every new technology (filesystems: NFSv4, security: Globus GSI, etc.) will allow these linkups, data transfer, and whatever we may awnt to do, to happen much more efficiently in the future.
see, that's the major fallacy of the hype behind "The Grid". yes, one of the benefits can be seen in the supercomputing realm, where you can link up many different machines (we haven't gotten to doing this between architectures yet, mind you) to make a gianto-machine.
however, the key in *all* of this is the technologies that allow for that to happen, along with the data transfer, authentication, and authorization, et al, that have to happen.
as far as cycles go, no, we probably won't see a dynamically created, scheduled, and allocated meta-supercomputer anytime soon. most companies will use these technologies to make static or mostly-static links between a few select sites and partners for now.
however, these protocols (GridFTP, ack), standards (OGSA,
to wit: the killer app in "The Grid" is not to make a giant supercomputer. it's to develop a lot of different ideas and technologies which allow for resource sharing (at the general level, among other things) to occur in a standardized, efficient, and logical fashion in the future. noone will use all of them, but the key is to use what you need from what "The Grid" encompasses. that's why it's referred to as "The Third Wave of Computing"!
1. e-mails with "EARN $$$ DOING NOTHING" ...
2. spyware that not only spies but also hijacks your CPU cycles for remote computation
3. dubious companies selling "grid computing" service pop up all over the place
4.
5. Profit?
It may look funny, but what if the next version of Windows comes embedded with this kind of thing? All it would take would be some marketing genius to convince enough people. (disclaimer: yes this is slightly paranoid, it's not intended to be MS bashing, just an example on how this technology could be misused).
The ENIAC Demo Competition
It really could be one of the next big things, considering the advent of Object oriented methods of handeling information it realistically could be a viable Object oriented model.
With The realitivly recent move of object oriented Programming, you could think of this as just the next level of abstraction, abstracting your objects out to a broader system level as apposed to an implemenatation level.
In Any event it would be a good scheme for many things that need distributed systems....such as cryptographic reaseach and other things that need to be distributed
Sounds like sourceforge projects; especially the discussion on standards and protocols going on in oss4lib right now.
it msut be late...or early. for a second, i thought i saw "girl computing...." and started thinking "wow, /. is getting a bit blunt these days....."
xao
xao
http://TheHillforum.hopto.org
You obviously didn't get the memo
I happen to know that beowulf clusters of quantum iPods, built by nanobots, running social software, using a Post-OOP paradigm and a journaled filesystem over a wireless IPv6 network to make profit with a subscription-based publishing model will be the next big thing.
Sun is heavily involved in Grid computing. They provide free multiplatform grid software (including for Linux), case studies, white papers, etc.
They also host an open source project Grid Engine for the software. The software used to be commercial, but Sun bought it and open sourced it, like they did with Open Office.
it msut be late...or early. for a second, i thought i saw "I'm so fucking funny - I can't read" and started thinking "wow, /. is getting a bit retarded these days....."
sounds a lot like good 'ole fashioned SMP to me, with a lot more disk space. As we all here know, not all computer-related tasks work well in a multi-processor platform, and as someone who has played with SMP programming, it certainly adds an order-of-magnitude level of complexity to try to harness the full power of SMP in your code. Compilers help, but not much...
never bring a twinkie to a food fight.
Neither this nor Social Computing is the next killer app... Social Grid Computing is.
I love teh int4rw3b!!!!!111one1
This is just an inverted version of the "network computing" universe where we all use thin clients that use a central server to do work. It can never become mainstream due to the physical limitations, not the technology ones. Suppose I am a corporation and I need a new big-iron system to process daily orders from our web site. Let's try grid computing: all 1000 employees in the company install a piece of software on their PC so we can use each PC to process an order, based on availability. The number of problems with this, as compared to using a central server, is incredible.
1) Still need a central server for storage/backup
2) One server needs one UPS, 1000 workstations...
3) Worsktations are flaky: They reboot, crash, play video games, etc. The distributed software can handle this, but the inefficiency involved is painstaking. I hope everybody doesn't run Windows Update all at once, or all the PCs could go down.
4) The corporate network is now a bottleneck.
I rattled off this list in about 30 seconds, so I'm sure there are lots more. Since these are physical limitations, not technology limitations, they aren't going away.
6 Years ago we were clamoring to use the desktops on the trading floor to run some of our financial models at night. We tried MPI and CORBA, and kludged together a workable (although lacking) solution. I can definitely see where a hodge podge solution like that needs to be improved, and it looks like the grid concept is looking to fill that gap, but at the same time the desktop is evolving. The Net PC seems to have gone the way of the DoDo, and while it is true that there are plenty of idle desktops during the evening, I would say most of the servers are well utilized by the departments.
The real benefit would be pulling in all that desktop power, but I do not believe desktops will remain as they are. With mobile work forces, and reshaped IT departments, workers are more likely to move about the company and form resource pools. In order to do that, they will need to be productive as soon as they set up shop with a new team. Current infrastructure makes that difficult to manage. The more successful implementations have those workers using laptops, which go home at night. Goodbye spare cycles. Future concepts seem to be brewing where you leave the peripherals and just carry around a small PDA size CPU+storage and plug that in at any station and your set to go. In that scenario the spare cycles are walking around with you. I only see limited use for consolidating existing server processing. There are already plenty of technologies that address that need. Not sure I buy this as "the next thing", but I guess the next short term thing maybe.
Anonymous Coward
10,000-foot view? What was wrong with the last cruddy neogism, helicoptor view, or, heaven forefend, an overview ? Still. I'm quite happy to run it up the old flagpole and see if anyone salutes it.
I have given a lot of thought to this concept in the past and, although I think it has a lot of merit I also think it will require a different underlying software architecture than any of those we use today.
Currently for distributed computing we have Thin-Client/Fat-Server, Client/Server, N-Tier and Shared-Node architectures. I think most people are expecting a Shared-Node or Client/Server for Grid Computing because that is how existing implementations work. The issue with either of those is the size of the work unit. If the work unit is small than the nodes/clients must sychronize often. If the work unit is large then you are more likley to have nodes/clients in a wait state because required processing is not completed.
Using a network style architecture (distributed Shared-Node) raises more issues because of message routing. Interestingly, this is the 'web-service' model! For example a web site must verify a customer, charge her credit card, initiate a shipping action and order from a factory in a single transaction. So you get four sub-transactions. Let's say that each of those initiates two sub-transactions of its own and each of those initiates one sub transaction of its own. We now have a total of twenty transactions in a hierarchy that is three deep. Let's also assume that we only have one dependancy (the verification) before launching all other transactions asychronously.
The problem here is response times, they add up. if the average response time is 500 ms, then three transactions deep gives us 1500 ms. The dependacy, at a minimum, doubles this. So it takes three full seconds to commit the transaction. Something a user might be willing to live with until a netstorm occurs and the response time drops to thirty seconds or more. (Note: Isn't it funny how you never see this math done in the whitepapers pusing web services?) But three seconds is far too long for sychronizing between nodes of a distributed computing grid unless you only have to do it every once in a great while, pushing us towards large work units and idle nodes!
So the Internet itself imposes costs on a distributed model that wouldn't exist on, say, a Beowulf cluster because that cluster would have a dedicated high-speed network. Client/Server architectures work better for the Internet, but require dedicated servers and a lot of bandwidth to and from them.
I believe the real answer lies in what I call a Cell architecture. This would require servers, but their job would be to hook up nodes into computing 'cells' consisting of one to N (where N is less than 256?) nodes. Each node would download a work-unit from the server appropriately sized to the cell, along with net addresses of the other nodes in the cell. Communication would occur between the nodes until the computation is complete and then the result would be sent back to the server. When a node completes its work unit (even if all computation for the cell is not complete) it detaches and contacts the server for another cell assignment.
By reducing cross-talk to direct contact between nodes within the cell we allow smaller work units. By using a server to coordinate nodes into cells we are allowed to treat the cells as larger virtual work units.
Comments?
- -
Are you an SF Fan? Are you a Tru-Fan?
Well, this scenario would not be appropriate, since there's hardly any processing involved in web orders. Mostly that is just database queries. But you could easily imagine that you'd see a useful speedup if you had your advertising firm's 3D animations rendering on every computer in the office, or your software development company's nightly build/regression suite. Fault tolerance (not trivial, but not impossible either) takes care of 2 and 3, so you just need to find an application that's a ppropriate to take care of 1 and 4.
..."The network is the computer" but IBM couldn't bring themselves to use that phrase.
Some companies have "a not invented here" problem with stuff but not IBM : Java, Linux, Cell, J2EE. Is there anything more substantial to IBM than a marketing department and two factories (one to make models of factories and the other churning out hard-disks designed to fry after 24hrs continuous use).
Why doesn't IBM just "show Sun the money" so they can get it on
'Be the change you want to see in the world' - Al Gore
First off, this stuff has been completely mainstream for over 30 years now. The only thing new is that it keeps getting renamed, This year it's called GRID. I remember when it was called timesharing, and Time magazine had cartoons depicting it is 1973.
The entire GRID standard actually only covers the data transfer and login. Becasue that's the only thing standard about the different types of hardware. You still need to write the software specific to the hardware. Even with tools like MPI programming for Sun big iron is nothing at all like IBM big iron. And you dont exactly use Java. The value is not in the software - that's why it's getting standardized and is given away for free. The value, as always, is in owning a huge pool of computing power and renting it out, or even better, selling it in racks full.
The only people benefiting financially are the people that make the hardware - IBM, HP, Sun, Fujitsu, etc. Just like 30 years ago. Open Source has completely devalued the software - why pay for that, money is better spent on more hardware.
Then there is the cost of transporting the terabytes of data involved in the types of problems you do with these systems. Transport costs are more then the computing costs in many cases - another reason that part got standardized.
Hardware costs are falling FAST. Blade mounted and racked CPU are running about $500/Ghz ($7k for the same from IBM). That means for about 1 million you can get something like 2K CPUs and 2Thz of power, running Linux and all the tools you need. Thats a lot of FLOPS.
For those kinds of costs, outsourcing it at seems silly. You still have to do all software development, data transport, post-processing, and research yourself anyway, and those costs DWARF the hardware/electricity/HVAC costs of owning the hardware and having exclusive access 24/7 until the next updgrade.
- Adam L. Beberg - The Cosm Project - http://www.mithral.com/
I've seen entirely too many articles (such as recently appeared in SciAm, and now this one appearing on /.'s FP) giving the
"10,000-foot view" of grid computing.
I've seen a few articles giving the 10-micron view, describing CPU architectures making use of a grid topology.
I've seen a few small demos of massively distributed clusters. I've heard hype about the idea of a service provider and service consumer oriented topology. I've heard about self-healing networks. I've heard about the PS3 making use of a grid-based system.
I have not heard any of the "step 2s", the means by which we transition from individual PCs accessing a network, to a single shared "grid computer" actually composed of the network. At least, nothing that would make the resulting network noticeably different than the current internet.
For individual systems (ala the PS3), grid computing seems like possibly the next big thing, sort-of an evolution of SMP to a scale larger than dual/quad CPU systems. The rest of it, the over-hyped massive "revolution" in how people use computing resources in general? Pure marketing hot-air, and nothing more. The closest we'll get to the idea of a worldwide "grid" will consist of an XBox-like home media console with anything-on-demand (for a fee).
Having done some work with Globus and Condor, it seems that your "cell architecture" is basically how things are setup now. Many institutions, like the group at the University of Illinois at Urbana-Champaign and the National Center for Supercomputing Applications(NCSA) have set up Grid nodes using toolkits and programs like Globus.
If you have an app which is Grid-enabled, a hydrology simulation for instance, you would get accounts on the various NCSA Grid nodes. Then you would use Globus or Condor, or the two in combination, to hand off your computation and data to the various Grid nodes, the nodes would compute, then give you back results. Your own computing cluster/Grid node could work on the results, have other nodes do more computing, etc until finished.
This reduces communication over the internet and keeps most communication to local networks. However, even if you did want to do communction between nodes, it wouldn't be as bad you point out. Most nodes are at universities that are on Internet 2 and have huge amounts of bandwidth available and low latency.
Commercial uses of Grid computing may differ as they will have difficulty using I2 or something like NCSA, but I'm sure the market will fix this problem.
My $.02 at least.
Sometimes I feel like a nut... Ok so it's most of the time
As we discovered early on in MIMD parallel computing, MIMD (aka grid computing) parallelism can only really help processes that are CPU bound in the first place.
Most of the processes that require 'big iron' are memory bound and I/O bound--e.g. databases that are hundreds of gigabytes to terabytes in size. This is why so many CPUs are '90% idle' in the first place, and this is why system designers devote more attention to bit-striping their disks, a good RAID controller, bus speeds, disk seek time and so forth.
Problems that require brute-force computation on small amounts of data, and produce small results, are simply few and far between -- and the people addressing those problems have been onto MIMD for decades. For instance, my first publication, in 1987 to the USENIX UNIX on Supercomputers proceedings, involved putting ODE solvers wrapped in Sun RPC, so that hundreds of servers could work on a different part of initial condition and boundary condition space, to provide a complete picture of the properties of certain nonlinear ordinary differential equations. Cryptanalysis and protein folding problems are already being addressed in a similar manner, and the tools to distribute these services as well as the required communications standards have been around for more than a decade.
Furthermore, if you've already got a marginally communications-bound domain decomposition of a parallel problem, and you want to cut down the communications overhead in order to take advantage of MIMD parallelism, the last communications protocol you're going to use is a high-overhead one such as CORBA, or a text-based message protocol such as XML. Both XDR and MPI are faster, more stable and better established in the scientific computing community than Yet Another layer of MIMD middleware--which is all Grid Computing is.
I had to laugh while reading this article. I've never heard of Grid computing before. However, about a month ago while sitting on the can, after just setting up my first cluster using OpenMosix, I had a very similar idea. Given a worldwide fibre network, systems similar to distributed.net could be set up, but simply to share idle processor cycles, with the hope that when you are ocassionally doing local computations that are red lining your proc, the offending processes could be sent out to the distributed cluster you are a member of. Sort of like p2p processor sharing. Of course, someone would have to write a kernel level dynamic memory allocation capable clustering thang. That definitely wouldn't be fun. Especially in win32.
LEARNING, n. The kind of ignorance distinguishing the studious. A. Bierce, The Devil's Dictionary
Here is a large Grid project that I'm working on.
from what I understand they did this at squaresoft during the makeing of the final fanstasy movie. indle cycles were used by rendering during the day when the cpu was not floored at 100% useage. Atleast from what i understood from the articles that i read about it. As well as workstations were tied to the dedicated render farm at night. Since many of the artists had more than one machine at there desk it proably worked out well both at night and during the day.
stupid picutres :-)
How 'bout this - build in a system whereby users who have downloaded a file can mod its quality up or down. Then while searching the network, you also get MD5s for the files, and the associate rating is accumulated from others who you search. This way crappy files float to the bottom.
There are a thousand forms of subversion, but few can equal the convenience and immediacy of a cream pie -Noel Godin
I would love a distributed beowulf cluster, there are several projects I need to do.
Only problem with distributing internal stuff to external machines is trust and better is denyability.
So ripping and compressing 1000 DVDs or 1million MP3s with better quality is probably not a good idea unless there is some method to cloak what is happening.
I'd go on a Vegan diet but the delivery time from Vega is too long. --brownkitty
I've been writing (partially for my CompSci Masters thesis) a new Grid-oriented application that may be of interest. It's called GridShell, and aims to provide a Free/OSS interface to any and all Grid technologies. Currently, GridShell's skin-able web-based UI (WebUI) is almost completed, able to provide the equivalent of an expert Grid administrator/user through a very "clicky-clicky" frontend.
Oh, and it's all 100% Object-Oriented Perl, for those of you who care about clean code.
More really crazy GridShell modules are down the road, so check it out!
http://www.gridshell.org
-Will the Chill
Creator of RPerl, Scouter, Juggler, Mormon, Perl Monger, Serial Entrepreneur, Aspiring Astrophysicist, Community Organiz
We're using it to create a highly available, highly scalable, easy to manage, high performance service broker system.
User says I want service blah, service broker manages where to run the blah application. You can kill loads of machines and the service continues to exist on the network.
Government of the people, by corporate executives, for corporate profits.
Grid computing has been the "next big thing" for the last (at least) five years.
/.-ers that are forced to maintain all this stuff, while everyone else has already moved on long ago.
First it was Globus... now companies have latched on that whole idea, hoping it'll be "the thing".
And you know what? It's not.
There are two points to all this:
The people who are already involved with this have already declared victory, and are going to work towards this, no matter if it's going to work or not. After five years of pushing it, you'd think they'd get the idea that people just aren't buying into it... at least not to the scale they think people will. Five years from now, we'll be getting posts from some group of poor
The second point is, don't believe anyone that says "it's the next big thing". The "next big thing" is going to be something that just sneaks up on everyone, the way Mosaic and Netscape did back in the mid-90s. It wasn't "the next big thing" until it was ALREADY "the big thing".
Grid computing is pretty interesting, if anybody wants to find out more I have compiled a comprehensive list of references on the subject. As well as providing a brief (20 page or so) overview of the available Grid solutions. www.netinvasions.com/files/GRID/grid-paper.htm
The projects that the grid is best at are pretty much the areas that already have 'grid' projects, biochemistry, genetics, SETI and some maths problems. In which I include one of the most appropriate maths problems for the grid, is brute force password attack. How long before the US Gov. starts a Patriot@home grid to brute force any encrypted files it wants to see, in the name of homeland security...of course.
My spelling isn't bad, I'm evolving the language
Can you give that in metric for us euros ;-) ?
I do software and sysadmin for scientists. Those with simulation or data analysis needs usually work either:
- connecting remotely to a main computer (say SGI in many cases) to run their jobs, at a high price for hardware and support and at the risk of saturating the machine when everyone wants in;
- or, with the more recent increase in computer power in PCs, running directly on their own PCs.
In both cases the PCs are underutilized most of the time. OpenMosix is a patch to the Linux Kernel allowing you to transform your workstations into a cluster. No software modification is necessary. OpenMosix balances the load automagically. No more expensive mainframe. No more powerful but underutilized PCs.OpenMosix has been featured on /. before: here, here, here, and here
Non-Linux Penguins ?
Intel reveals Itanium 2 glitch
By Stephen Shankland Staff Writer, News.com May 12, 2003, 12:36 PM PT
CUSTOMERS TOLD TEMPORARY REMEDY: Until the next iteration of chip arrives though, Oliver Wendell Jones writes, "they recommend working around the problem
by underclocking the processor to run at 800 MHz instead of its default 900 MHz or 1 GHz."
Intel disclosed an electrical problem Monday that can cause computers using its flagship Itanium 2 processor to behave erratically or crash.
Read more about Itanium
Customers can sidestep the problem by setting the processor to run at a lower speed, said company spokeswoman Barbara Grimes, and Intel will replace the
processor if customers want. The glitch only affects some chips, and then only in the case of "a specific set of operations in a specific sequence with
specific data," according to Grimes.
"If the customer feels it's the right solution, we'll exchange processors with ones that aren't affected," she said. Intel has developed a simple software
test that can determine whether a chip is affected.
The problem likely is fairly uncommon, Insight 64 analyst Nathan Brookwood said. "These machines have been out there for a year, and it only now is showing
up, so it's got to be fairly rare. If it's something that was more commonplace, we would have seen it a lot sooner, or they would have found it in their
alpha or beta testing."
Still, the problem is a black eye for Intel, which has been positioning its Itanium line to take on high-end chips from Sun Microsystems and IBM for use in
powerful servers with dozens of processors.
"Virtually everybody has these kinds of problems," Brookwood said. "When you consider the hundreds of millions of transistors that go into these complex
designs, it's amazing we don't see these more often."
The Itanium 2 has data protection features and a 64-bit design that can handle vast amounts of memory, making it better suited to high-end servers than
32-bit processors such as Intel's Xeon and Pentium. Its performance has been good enough to boost Windows servers to the upper echelons of the server market,
but the processor family's arrival has been clouded by initial delays and by the difficulties of running software written for Pentium chips.
A computer maker found the electrical problem in stress testing earlier this year, and Intel confirmed it was a problem with the chips, not the software or
other parts of system design, Grimes said. The problem affects both 900MHz and 1GHz versions of the Itanium 2, code-named McKinley. However, it doesn't
affect a faster 1.5GHz successor--called Itanium 2 6M and formerly code-named Madison--that is set for release in mid-2003, she said.
The ripple effect
The problem has begun rippling through the computer industry. IBM said Monday that it has put shipments of its just-released x450 Itanium 2 server on hold
until the glitch is fixed and is notifying customers that have the systems.
"Until we're sure the issues are 100 percent resolved, we're going to keep holding back shipments with the 450," IBM spokeswoman Lisa Lanspery said. "We have
a policy of zero tolerance for undetected data corruption" at a customer site, she said.
The move doesn't affect IBM's overall Itanium plans, which include a server based on the Itanium 2 6M and planned for later in 2003, she said.
Hewlett-Packard, which co-developed the Itanium design and is building the processor family into its entire server line, said computer shipment plans aren't
affected because it's screening affected systems before they ship. The company is working to help customers that already bought the systems.
"We'll do whatever meets the customer's total satisfaction," said HP spokeswoman Kathy Sowards. "We're working very closely with Intel to come to a
resolution for any customers th