Distributed Computing Overview
Fruitiger writes: "Well, P2P / distributed computing is all the rage these days, so if you want a good breakdown of who's doing what when, check out this article at Network World Fusion. Focuses on Porivo Technologies and provides some glimpses of what's to come in the future. An interesting appetizer before Intel's P2P Working Group meeting later this week."
I dunno..something sounds shady about that name....fusion? network? world....weird.
The anti-salmon
Seti has lost a lot of its appeal even to geeks. How well would a non appeal project get users?
Neu
This peer-to-peer networking concept is good, but there seems to be an issue of trust in play.
It's like when you share a bathroom with someone. Generally it's ok to share a toilet, sink, and the same roll of toilet paper (as long as users aren't there at the same time). But you don't want to share your bathroom with some stinky loser. It might lead to your comfortable room with the reading stool looking like a public restroom.
regarding the market...
The concept of sign up for our ISP and get a "free" computer was wildly popular...How about, run our software and get a free computer?
This has MANY advantages, including:
1) It really is free, you don't have to pay for the ISP service (which was more like financing)
2) Parents can get computers just for their kids, and while the kids are in school/asleep the computers can be running various routines and be paying themselves off
In turn, this would help increase the number of people with computers, as poor people wouldn't have to pay for the computer, just not use it all the time...and also in turn, it would increase computer literacy...
I'm also sure geeks/gamers would love this oppurtunity, since its a way to get a powerhouse computer(/computers?) free (or at least relatively cheap)
and that's only the beginning....
I'd like to see how this turns out, and how it gets used/abused...
--------------
Seeing the popularity of projects such as distributed.net's or Seti@Home just goes to show that distributed computing is definitely "in".
I have just one question about this? How well would it work with single processes or is it still limited by the same problems most parallel processing and multi-processing schemes encounter from data dependency?
If someone has found a way to take a single application (that is parallel processing dumb) and get it to run across multiple machines, well, they could be in for some serious cash coming their way.
wolf31o2 Developer, Gentoo Linux Games Team
Instead of purchasing more hardware and software and hiring the IT staff needed to set up and support it, an emerging technology called peer-to-peer (P2P) computing will let users access valuable resources when they aren't being used.
Pardon me, but how do they figure P2P will implement itself? Any new tech needs a human being to set it up, maintain it, format the results and queries and explain the whole thing to the boss. This tech will save money with companies by allowing them to get more computing power for less cash, but the up-keep and back end is still needed. Maybe the figure this will be a nice little project for all those lazy IT folks just lounging around doing nothing...I'm certain the SETI thing isn't running itself......
Porivo's Peer client, which resides on a user's desktop, works with the company's PeerPlane management software, which can reside on a dedicated server.
Great, we all have an extra dedicated server laying around..
Dirty Pirate Hooker
Well, I guess that is one solution to the processors-are-faster-than-the-network problem: Slow the processors down. Seriously, the hole point of distributed computing is to be able to solve large problems quickly. Having a thousand computers running your code doesn't help you when they are running everything inside a Java sandbox -- it will end up slower than a single computer running a C implementation of the code.
Tarsnap: Online backups for the truly paranoid
This sort of thing is very cool, and rather attractive. While I wager it isn't a drop-in solution that suddenly opens up a nirvana of computing, it *is* something that has been predicted (read about it on Slashdot a ways back, anyways), and would be cool to have come around.
Not only does this help average joe user (he can sell his CPU time to people who need to compile something), but think of it in a work place.
Where I work, everyone has a workstation. Problem is, everyone does their CPU-heavy work on a central server. Thus, these workstations are being under utilized. Sun (and a number of people) have this problem addressed halfway.. they have software that queues up tasks, and distributes them to idle CPU's as they free up.
What if we had a sort of "CPU NFS" in a way that individual instructions are handed off to remote machines, rather than entire jobs?
Mm, I want.
Of course, any such idea would be riddled with difficulties (I wager the complications would be like the ones NFS has, only worse), but the idea, again, is attractive.
Is this something Mojonation could expand to? They mention several times that you 'donate cpu cycles', but it never seems to directly state you can sell build time on your machine.
I'd like to see that.
I run Frontier(tm) from Parabon Computation (http://www.parabon.com). it uses distributed computing to perform cancer research. That's one of the best uses for p2p i've seen so far.
You can read that article for a glimpse of the future of Distributed Computiing, or you could just save time and ask me.
;)
------------
"John, what is the future of Distributed computing?"
I'm glad you asked. Coming up soon is three more business schemes and companies who will take applications for future testing and promise you to make $$Big Bucks$$ for using your spare CPU cycles. The companies will then stop updating their new page about a month before disappearing all together.
Next up: Slashdot re-runs an article from June's Issue of Wired!
------
Let me give you the lowdown
well, since it's now obvious, who's going to patent it?
These are my friends, See how they glisten. See this one shine, how he smiles in the light.
Why you ppl promote private companies stolen ideas? What about an effor on something REALLY distributed, so there will be no monopolist around?
You want distributed computing? Just sit back, relax, and wait for .Net to appear.
It's gonna blow you away.
Back in the day (1991? 92?), batch.uu.net was an expensive Sun 640MP that just couldn't cut it. Pushing USENET through the box was hard enough. Compressing batches of news articles for dialup customers was more than it could handle. Instead of buying more hardware, our fearless leader came up with the idea of using all of the idle machines in the office at night (Sparc SLC/ELC/SS1/SS2) to run compress on them through rsh pipes.
Moral - a good sysadmin in your hand is worth two P2P sales reps in the bush.
For the most free computing power at your fingertips, hire script kiddies.
Didn't sun just release the Sun Grid Engine just the other week? It does the same thing, run distributed compute intensive jobs on idle systems. Of course, you can download the stuff now... no need to wait, and they promise that they will release it under an open source license in the near future.
Peer-to-Peer working group meets this
week to define how to share unused CPU,
storage capacity across nets.
By APRIL JACOBS
Network World, 10/09/00
Intel, Hewlett-Packard, IBM and a slew of start-ups
will meet this week to set up the structure for a
working group that would give corporate customers a
new way to harness the collective power of
networked PCs, workstations and servers for
computer- and storage-intensive jobs.
Instead of purchasing more hardware and software
and hiring the IT staff needed to set up and support it,
an emerging technology called peer-to-peer (P2P)
computing will let users access valuable resources
when they aren't being used. The result: Users could
save millions of dollars by tapping unused processing
and storage resources.
P2P basically sets up a virtual supercomputer by
allowing the exchange of data among multiple
computers connected via a network. The software
that powers Napster and Gnutella is often held up as
the best example of the power P2P can harness.
In addition to next week's meeting, at least two firms,
Porivo Technologies and Mangosoft, will soon
announce P2P products aimed at corporate network
customers. Intel is testing a new peer-to-peer
application that the company says will save WAN
bandwidth and deliver applications and data more
quickly than existing technologies.
Porivo will roll out Peer, a secure, Java-based
application designed to let users harness spare PC
computing capacity, says Will Holmes, CEO at
Porivo. Porivo's Peer client, which resides on a user's
desktop, works with the company's PeerPlane
management software, which can reside on a
dedicated server. PeerPlane essentially aggregates the
computing resources of PCs connected to corporate
networks, letting users distribute work among them.
Mangosoft next week plans to announce Mangomind,
which it is billing as the first multiuser, Internet-based,
file-sharing service that provides real-time file sharing
for secure business communications. The new service
is a secure way for multiple users to access, share and
store files. Mangomind will let users work on their files
offline. When users go back online, Mangomind
automatically updates and synchronizes their files.
In the Groove
Another member of the working group - Groove
Networks - plans a highly anticipated Oct. 24 rollout
of its P2P technology, which will be aimed at
collaborative computing. Groove's founder Ray Ozzie
created Lotus Notes.
P2P could take many avenues in meeting the
computing needs of end users, much as the Web has
become more than a tool to deliver simple page
requests, says Andrew Mahon, evangelist at Groove.
Mahon declined to provide specifics about Groove's
product (for more on Groove, see 'Net Buzz, page
98).
One company interested in Porivo and other P2P
technologies is United Technologies Research Center
- the research arm of United Technologies. Paul
Kirschner, a senior project analyst at United
Technologies, is looking at how his company can
harness the power of computers across the company
to do production work.
What Kirschner likes is the idea of being able to do
massive compute jobs that might otherwise mean
buying more expensive hardware and software.
"Obviously, if you look at the number of desktops
across the company, there are tens of thousands," that
could potentially be tapped, he says. "To use what is
just sitting there doing nothing quite a bit of the time is
what makes this attractive because if you looked at
replacing that power with another box, another
cluster, that would represent a significant investment."
As a result, Kirschner expects to have P2P
technology up in some capacity by year-end.
Kirschner likes Porivo's offering because the desktop
client works with Windows 95. Others, such as
TurboLinux's EnFuzion software, only support
Windows NT and various flavors of Unix.
But that doesn't mean he's ready to bet the farm on
P2P.
"The technology is new, and how it is going to play in
the corporate environment isn't certain yet," he says.
"People will not tolerate it if their machines crash, slow
down or get locked up, or if unusual things happen."
While EnFuzion may not fit into United Technologies'
infrastructure, it has found a home elsewhere.
TurboLinux announced earlier this year that J.P.
Morgan is using the software to help power the firm's
worldwide risk management system for fixed-income
derivatives.
Cheryl Currid, president of the Currid & Company
consultancy, says P2P's big draw for corporate
customers is processing power that companies don't
know they have. "What they can get from
peer-to-peer is low-cost, high-capability processing
and storage."
Currid says users can benefit from P2P to varying
degrees - depending on how much effort they put into
incorporating it into their infrastructure. While
engineering and scientific jobs are a logical place for
P2P, more commonplace financial applications are
what could put it in the spotlight. "Imagine if your
trades could come back to you three times faster
because your company was using P2P to process
them in real time, instead of having to do big periodic
batch jobs," Currid says.
Intel is in
Intel is also using P2P. The company made a lot of
noise recently when it talked about how it saved $500
million over the past 10 years using a P2P application
called Netbatch. The application lets Intel engineers
harness more than 10,000 workstations across Intel's
network to do compute-intensive jobs for chip design,
says Manny Vara, an Intel spokesman.
"Every time we were designing a new chip, we were
buying a bunch of new mainframes to get the job done
- and that was just one area," he says.
Vara says Intel is testing a new application that goes
even further. He says Intel will try out a system that
will detect when employees access the WAN to
retrieve video files. If another employee at the same
location has already downloaded it, the P2P
application will retrieve it from that system where it
has been stored instead of going over the WAN to get
it.
What network managers will likely debate as P2P
gains momentum is how to use it without slowing
systems. Currid says estimates are that 75% of the
average PC and 60% of the average server go
unused.
Busy signal
But what about when they are busy?
P2P software from companies such as Entropia,
another member of the P2P Working Group, let
customers set policies that govern when computer
resources can be harnessed. Using Entropia's screen
saver makes it fairly easy. The computer's resources
are only used when the screen saver comes on. The
moment it turns off, indicating the machine is going to
be used, the P2P processes are halted.
Many P2P questions will hopefully be answered by
the working group set to meet in San Jose.
The meeting will be more organizational than anything
else, according to Intel's Vara. The members will
organize into task-related groups that will determine
how to solve issues related to interoperability,
standards and security.
Other members of the working group include Applied
MetaComputing, CenterSpan, Distributed Science,
Dotcast, Enfish Technology, Engenia Software,
Flycode, Kalepa, Statis, United Devices, Uprizer and
Vtel.
Microsoft announced today that it will release a Windows version of P2P software "soon, in fact before any of those other companies can do it. If it looks like one of them will beat us, we'll buy them anyway, so you should wait for ours." The first release will run only on Window ME, and will most likely be named "P-On-ME".
It was also revealed today that the first virus for P-On-ME has been discovered. It is contained in email messages with the subject line "Do Not Open - Virus Inside".
A week ago I set up the JAPPS (Java Applet Parallel Processing Server) download page, since then, there was 5 or 10 downloads, but nobody comments it, Is a good example of Distributed Processing OS, kinda Seti@Home, but multi-user, multi-Purpose system, and needs no instalation (Runs in a Browser). Please somebody make a DES-key Crack with it and send it to me :)
The URL us http://wk1300.8k.com/japps
I believe Sun already released a distributed resource package called "Sun Gridware"
http://www.sun.com/software/gridware/
looks like it is available for download already and the page says the code is slated to be released under an "industry-accepted open source license" I know that phrase will probably raise a few hackles but its better than what you'll get from most companies.
Wasn't something like this developed a long time ago by UW (University of Wisconsin - Madison)
http://www.cs.wisc.edu/condor/
Here is an except from their page.
What is Condor?
Condor is a software system that runs on a cluster of workstations to harness wasted CPU cycles. A Condor pool consists of any number of machines, of possibly different architectures and operating systems, that are connected by a network. To monitor the status of the individual computers in the cluster, certain Condor programs called the Condor "daemons" must run all the time. One daemon is called the "master". Its only job is to make sure that the rest of the Condor daemons are running. If any daemon dies, the master restarts it. If a daemon continues to die, the master sends mail to a Condor administrator and stops trying to start it. Two other daemons run on every machine in the pool, the "startd" and the "schedd". The schedd keeps track of all the jobs that have been submitted on a given machine. The startd monitors information about the machine that is used to decide if it is available to run a Condor job, such as keyboard and mouse activity, and the load on the CPU. Since Condor only uses idle machines to compute jobs, the startd also notices when a user returns to a machine that is currently running and removes the job.
Sounds quite similar.
As we all know P2P is nothing new, it is basically distributed computing packaged into the glossy P2P term. This is something that really makes me annoyed, but what is far more worse is the fact that articles like the one in NetWorld Fusion is largely incorrect, when it comes to the technical part of it, and oversimplified.
There are lots of technology that has existed for years that perform similar functionality, like Distributed OS / DB / FS, and cluster technologies. Of course, not all of these implementations are focused on what they want to achieve with a system in this article, but there are also some that will do the exact same job.
Ther are things missing in the article, and there are some of the humourus suggestions.
For example:
"Intel is testing a new peer-to-peer application that the company says will save WAN bandwidth and deliver applications and data more quickly than existing technologies"
This is by the way know as a distributed / hierarchichal caching proxy.
"Mangosoft next week plans to announce Mangomind, which it is billing as the first multiuser, Internet-based, file-sharing service that provides real-time file sharing for secure business communications. The new service is a secure way for multiple users to access, share and store files. Mangomind will let users work on their files offline. When users go back online, Mangomind automatically updates and synchronizes their files."
This is already known as AFS or Coda, Coda allows caching and disconnected operations, and believe it or not is architecture independent, which allow you to operate over the internet.
"Kirschner likes Porivo's offering because the desktop client works with Windows 95. Others, such as TurboLinux's EnFuzion software, only support Windows NT and various flavors of Unix."
Who would ever, except for Kirschner, consider using Windows 9x for what is inherently concurrent computing. Widnows 9x is not capable of propper concurrency.
But what is really lacking in this article is a description of the costs and consequences of implementing this in an environment.
Look at the top diagram, it points out that you can share/distribute CPU and disk usage. But for that to happen you need several upgrades of the client. First of all, you would need something else that an IDE disk on each client, since IDE is not capable of concurrency. Second of all you would need to upgrade your network, because it wont work efficiently on a 10Mbit network, you would need a Gbit network, this also includes the rest of the network infrastructure. Thirdly, you need user software that is capable of distributing processing jobs among the processors on the cluster, i.e. heavily threaded software. All this costs money and stil the software part might not be available for the software the company is using.
There is also the isssue of processing power gained, you have to analyse the load of the computers where you want to implement this to see how much can actually be gained by doing this. Of course there is lots of unused power lying around, but if the gain is only 10% then it is probably not worth it.
There is also the security aspect, escpecially for distribution of disks, and especially if you are using those disks to store company documents on them. The machines would be physically more accessible to thiefs. Or what if the user turns off the power of the machine, without shutting down, that might make, in the worst case, documents inaccsessible, or computing data lost.
Therefore, data storage should be on server clusters instead, and only use the clients for CPU/memory sharing
Lastly the article asks the question about what to do if the computer is busy, and the suggestion to that problem, according to the article, is that P2P should only run when the screensaver runs.
The question I have about that is, how often does the screensaver run compared to the load on the computer throughout the day, not often. Therefore you would not get much benefit of the "P2P" technology. A simple but much better solution is to use a priority configuration for local and networked processes, the simplest could be give local processes higher priority that networked processes. Of course the priority system is not by any means simple a simple system, so it cant be done out of the box, it depends on what really needs to run at higher priority than other processes.
There is probably a lot I forgot to mention, but what was stated, certainly applies as a critique to this article and P2P in general.
Glo
--
Entropia 2000 is designed to support multiple P2P projects for charities. Currently FightAIDS@HOME, a P2P to search for new AIDS drugs, is up and running for users with a non-dial-up connection to the Internet.