Future Of Internet-Based Distributed Computing
miss_america writes: "CNN is running an article about how the Internet has fueled distributed/parallel computing. It talks about the limitations, implications and possibilities of internet-based distributed computing. The article highlights UC Berkeley's SETI@home project, Distributed.net, and the ProcessTree Network."
This got me really thinking as to some other 'legal' aspects. If you give part of your CPU time to a non-profit group would you be able to write it off? How many more people would run Distributed.net or SETI@Home if you got to write off $.01 a block or some such thing?
I think it would be an interesting avenue to persue for the people running these sites. The only thing better then getting paid is getting paid from the government.
What CNN doesn't talk about is security for the participants' machines. Open source is helpful, because you can see what you're running, and people can find bugs in it, but that's really most effective for the first few special projects like GIMPS, distributed.net and SETI than it will be for running arbitrary code in a large distributed-processing industry. The worst case would be malicious distributed-processing code (either viruses or simple DDOS applications), but even non-malicious code with buffer overflow bugs could be a real disaster, both to the PC users and to whoever their machines might be used to target. It's possible to be somewhat safer by using sandboxed computing environments, such as Java, so everybody knows their machine will be safe, but they tend to be much slower than running compiled native applications. This can be improved somewhat by using standard compiled libraries, e.g. bignum calculations, but it's still a wide open problem.
Are there any environments you know about that are safer, or safe enough and faster?
Bill Stewart
New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
The second item, and possibly the most important, is getting people to run a distributed client itself.... People need to be passionately involved to run distributed clients.
... but that's not the only way such clients can get (ahem) distributed.
No, people need to be passionately involved to install distributed clients
When a new employee joins our organization, he or she gets a computer with a "corporate image" on it; an approved operating system (NT, Linux, or Solaris) and the associated applications. If we had a corporate need for some sort of distributed computing, the client could be added to the image, so it would be part of every PC on every desk (or in every lap). With distributed administration tools, such clients could even be installed retroactively. It's the company's computer; is it so wrong for the company to direct its use? (Assume they're smart enough to set this up so it doesn't screw up employee productivity, which is more important than "computing.")
I think this model might have been used by the staff of the company that did the graphics for Babylon 5. I wouldn't be surprised if the NSA already does this. --PSRC
Stupid job ads, weird spam, occasional insight at
Over 5$/month. You can collect this payment by turning your computer OFF when you're not using it and the "payment" will show up on your electic bill.
- Adam L. Beberg - The Cosm Project - http://www.mithral.com/
you're providing FREE cpu clcles to seti@home, and you wouldn't mind seeing ads in the client? are you on drugs?
it's unpleasant enough to have to sit through commercials while I'm watching cable tv; lets not add another channel where advertizers can reach us while we're doing company a favor.
When monitors and printers die, they tend to catch fire. Just ask WB Doner advertising in Detroit where a friend of mine used to work. Printer caught fire in the middle of the night and burned the entire business up. They had to tear the building down to the steel and concrete to fix the damage.
So turn off your printers at night too.
If tits were wings it'd be flying around.
Another such start-up aiming to take away your processor cycles is Centrata.
You should never take life too seriously - You'll never get out of it alive.
If I were an oil company, with hundreds of offices and tens of thousands of computers of a wide range of models and I've a computing problem that can be solved by either buying a $120M supercomputer, or developing a distributed protocol, I'd seriously look into distributed protocol. After all, most computers are idle most of the time. Unfortunally, there aren't many problems for which both a supercomputer and a distributed protocol would be viable solutions.
-- Abigail
While the number of floating point operations per second surely has some merit, FLOPS speed is certainly not the strong point of supercomputers. A supercomputer is a device that turns a calculation problem into an IO problem. The ability of moving shitloads of data around in small units of time is what makes a supercomputer a supercomputer. In the foreseeable future, the bandwidth of the Internet isn't going to approach even a tiny fraction of the bandwidth of the IO channels in a supercomputer.
-- Abigail
One word:
Oblivium
---
---
"Multiple exclamation marks are a sure sign of a sick mind." (Terry Pratchett)
How would you know? Nobody has any data to indicate this it is indeed worth only a few pennies a month. You are assuming that ProcessTree would give any packet to anyone, and so everyone will want one? I am sure they keep track of who is reliable/fast and who is not and distribute their load accordingly. A load balancing scheme is definitely going to be in place.
Regarding motivation, the user is going to see that he has nothing to lose and everything to gain and just sign up. And people looking relatively cheap computing might want to consider this, as opposed to running jobs on their local supercomputer cluster (which frequently are overleaded anyway). As long as there is enough demand, they will be supply.
The Byzantine Generals problem deals with exactly what McNett needs:
The Byzantine generals problem is formulated similarly. One formulation (the closest to this) is: N generals are on a hilltop, about to attack a city. K are traitors, who will interfere with any protocol in the most damaging way possible. They must agree on some piece of data (the time to attack the city) reliably. Here is a link with some explanations and implementations of the solution.A commercial "Distributed.com" would have a simpler problem, because they can reliably a) authenticate a computer's identity, so they know if two messages come from the same computer, and b) they can assume that the server isn't a traitor. This will severely reduce the level of redundancy necessary. Still, they must deal with truly malicious nodes, whereas Distributed.net has only had to deal with faulty ones.
As for granulating the data so that K traitorous nodes cannot glean something useful from the data, this should be interesting information theory. I would think that adding some garbage data to calculate from, along with the real stuff, might be a decent cost/security trade-off.
As Mr. Old states in the article, these codes just don't lend themselves to this kind of high-latency, low-communication processing. In fact, to the best of my knowledge, all of the "potential users" the article mentions (seismic analysis, structural analysis, fluid dynamics, stress/crash testing) do not scale well AT ALL under this kind of system because the communication needed is far too frequent.
Don't get me wrong, I think internet distributed computing has a future doing certain, very specialized jobs like rendering. I just don't see it becoming the "next big thing" for scientific computing anytime in the near(or even somewhat near) future.
- First, as they suggested with the SETI project, numerical accuracy is always a concern. Floating point mathematics (which are critical to 99.9999% of huge computing problems) are vary widely from machine to machine. Results do vary across platforms.
- Secondly, use of the internet adds tremendously to communication overhead, compared to use of a local network. This means that some projects that would benefit from classical local parallelism may wind up being hurt by a internet scheme.
- Third, real industrial computations (oil-field computations included) tend to involve tremendously large and arcane libraries and datafiles that the user will have to copy. This will bloat the size of what the user has to download.
- Fourth, real industrial computation is extremely sensitive. I'm a grad student, and I've been working on a problem from a DOE lab. The only way we have a copy of the binary is due to our special connections with the lab. There is no way in the world a lot of "real" HPC code/binaries can be publically distributed.
- User security is also an issue. Many of these codes have to do a bunch of disk I/O. Whats to stop a "customer" from distributing a program that gathers user data and/or modifies disk files?
- A lot of HPC code is written in FORTRAN. 'nuff said.
The internet still has a long way to go to be a real platform for high performance computing. Building yourself a Beowulf cluster and syphoning time off of your in-house linux boxes makes much more sense for now.A major contender that didn't make it into this story is Parabon Computation. We're general-purpose (you can run anything on our system) and commercial -- we'll be publically available to anybody who wants to run a job, and we'll pay people to run an engine (or allow them to donate time or payment to good causes). Our server and engine are robust, scalable, safe (security was a major design consideration), and ready for the big time -- we're doing an open beta test now (http://www.parabon.com). We even have clients running already -- biological computation, even very cool photorealistic rendering (http://www.parabon.com/challenges.jsp). We're poised to do some really cool things -- and we're much further along than most of those mentioned in the article, who are generally either non-profit or just in the initial financing and design stages now.
-spc
Remember that stupid *.vbs script being passed around? Well it could have been running something really useful!
It should be renamed "the search of intelligence at Berkeley". Bastards wasted unbelievable amounts of computer time with that stupid bug that caused everyone to get the same segment of data to crunch. Then they couldn't get their act together on the website.
But the graphics sure are neat (and note, they don't run on NT servers because they need 256 colors DUH!)
Goddamn waste of time, and they certainly don't get MY machine time.
in this age of communication i'm just not getting through
I think the best distributed processing project I've been involved with is GIMPS, the Great Internet Mersenne Prime Search.
Mersenne numbers are numbers of the form 2^p-1, (2 to the pth power minus 1) Generally, when Mersenne numbers are mentioned, Mersenne numbers with prime exponents are what is actually meant. The Mersenne number 2^p-1 is abbreviated Mp.
A Mersenne prime is a Mersenne number Mp which is prime. P must itself be prime, otherwise Mp has a trivial factorization. Namely if p is divisible by a and b, then 2^p-1 is divisible by 2^a-1 and 2^b-1. More generally, gcd(c^a-1,c^b-1)=c^(gcd(a,b))-1.
So basically, what it boils down to is that you can test the primality of a Mersenne number a lot faster (Using a Lucas-Lehmer test), with a computer and find REALLY big prime numbers. For example, the biggest prime # found to date is the Mersenne Prime where p=6972593 which has 2,098,960 digits in it.
The EFF is offering a $100k award to the first person to get a 10M digit prime number.
I highly suggest you switch from boring old D.Net or SETI@Home and go for finding big prime numbers
We also have been used (using loads and loads of Linux machines, I might add) to solve some extremely massive optimzation problems (using over 1000 non-dedicated -- i.e. desktop -- machines at one time.) The problem in question has been around for 32 years, and was solved using Condor in 7 days!
So anyway, on all of those platforms we support checkpointing (restarting a job on another machine) and remote procedure calls (having a job on a remote machine think its on your machine).
Plus you can download Condor right away and get it up and running! Its cool stuff, but then again I might be biased :)
Good Fast Cheap. Pick any two.
--
$x='S24;r)>63/* h@<5+oZ)32"5cz';$me='phroggy'x$];
$x=~y+ -xz+\0-Tx+;print$_^chop$me for split'',$x;
Place: Distributed.net HQ
Time: end of 198 year long search for the meaning of life.
"... and the answer is: ......"
"42."
"42? What the hell!?!."
Ham on rye, hold the mayo please.
thelocust[dot]org
The latest Linux cluster to go into production for the government nuclear stockpile simulation handily beats distributed.net
It doesn't even cost a whole lot of money. Lots of companies could afford a machine that size or larger.
If tits were wings it'd be flying around.
correct me if I'm wrong, but isn't this exactly what process tree is supposed to do? (eventually) they will essentially buy your extra processor cycles...
SETI@home is also looking for radar. Radar is very analog (generally pings at one frequency). Read More...
Will I retire or break 10K?
We've got volunteer/non-profit CPU cycle networks, and we're going to have at least one for-profit group starting up soon. I don't speak for everyone, but I am more likely to donate my cycles to a project that has a strong benefit for everyone, which is not done for profit motives. Why should I donate cycles to a project to make someone else rich? That said, I might be persuaded to *sell* cycles to a for-profit company provided it was worth my time.
If tits were wings it'd be flying around.
When reading the article, it occured to me that massively distributed projects can only be really effective for tasks that don't require low latency. You can't exactly run Quake on a distributed supercomputer that goes over the internet, because by the time the packet returned with the end-results, the frame they were for would be many seconds in the past.
Distributed computing is currently only effective for things like Seti or Distributed.net where blocks can disapear into distributed space for hours before returning a result. For this reason, I can't see the current level of distributed technology taking off.
The second item, and possibly the most important, is getting people to run a distributed client itself. Think about it, people run Seti@Home because of an almost religious conviction that they might be able to help find extraterrestrial life. With distributed.net, it's all about the geek-romance of brute forcing huge keys. I can't see people getting passionate about speeding up financial forecasts or bragging to their friend how they helped render part of a frame of some undergrads Multi-media project.
People need to be passionately involved to run distributed clients. If you paid people for their distributed time, the total would probably come up to a few pennies a month. Most people would spend more then that in their own time simply downloading and installing the program!
Distributed computing on this scale can't be effective unless the users who offer their CPU ticks are passionately involved. Business models based on selling ticks are doomed to fail if they can't capitalize on emotional involvement in distributed projects. Money, as shocking as this may sound, just ain't enough for this application.
You substantially shorten the lifespan of your hardware by powering it on and off regularly.
There are a lot of factors, but thermal expansion/contraction is probably the most obvious.
DNA just wants to be free...
If you want to start your own project right now, today, go get the Mithral CS-SDK. It was pre-released a few days ago, and came out of the Cosm project.
It will let you put together a d.net/SETI style project in a few days (I would know). Finding something worth doing is up to you :)
- Adam L. Beberg - The Cosm Project - http://www.mithral.com/
GIMPS, the Great Internet Mersenne Prime Search still needs your CPU cycles. It's good math, and can use all the CPU it can get, and it's found four of them already. It runs quietly in the background, and cooperates will with firewalls and with full-time or part-time internet connections. I don't recommend running it on laptops using batteries, since it eats power, but it's fine for any machine that's plugged in.
Bill Stewart
New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
Just for the reasons described in the article. To rehash them briefly:
So what's left? 3d rendering with procedural textures, genetic algorithms, and proofs to obscure mathematical problems which require a large amount of trial and error. If there is such a thing, anyway... IANAM (Mathematician.)
You might also be able to do some sorts of 3d rendering with bitmapped textures, bumpmaps, and so on, as long as you are dumping the same person a sequence of scenes which all use the same textures. The problem is that you want to make very very sure that any time a user needs to have new code to solve your problems that they are able to veto it, or at least that it is sent in the most secure method possible. Further, the ONLY THING that any outside user should be able to send you is your datasets - Never new code. While this limits somewhat your ability to work, since you can't really implement a whole VM on the remote systems (due to space and memory constraints) that doesn't hurt you much.
The problem is that as you make a system more flexible you also make it more insecure. (Does this comment make my code look fat? ha ha.) And of course, flexibility is what will enable you to actually sell this CPU time to a variety of people - Not just enhance that ability. Without a great deal of flexibility you lose your ability to adapt to a wide variety of customer scenarios.
"You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
Using screensavers is a cool idea and all - but you can only have one screensaver set to run at a time, no? Can I run SETI@home and distributed.net simultaneously? (Not that I'd want to - but I might want to schedule some priorities so each would get equal time while I'm gone for a weekend).
Maybe if condor shipped with linux distribs, it'd make it easier for this technology to take off?
A very quick and easy search of the distributed.net site will show you that the "moo-ing" is still available with a 3rd party application.
In Soviet Russia...michael would be rotting in Siberia!
From the popularpower.com website:
Get to the front of the line for paying jobs by building a reputation now. By joining Popular Power during our preview period, you become a charter member, giving you prime positioning for paying jobs when they become available
If that isn't a paragraph ripped right out of "Schemes and Scams for Dummies" I don't know what is.
In Soviet Russia...michael would be rotting in Siberia!
I used to contribute many cycles on many machines to distributed.net, but I haven't recently. I have never contributed to SETI@home ever.
I lost my interest because the scientific and humanitarian benefit was't great enough. distributed.net dangled the carrot that breaking large keys would help to force Congress' hand regarding pathetically small key-lengths. Now that the current project has been running for an extremely long time, I think the value of that has run out. I just can't think of a good reason for wasting cycles and electricity on a problem that has no scientific or political value anymore.
SETI@home doesn't interest me either, not because aliens aren't cool - first contact would be an amazing thing and that's an understatement. They already have more power than they can use right now, and running a memory hungry client just isn't worth it for a pathetically small contribution to the project.
The colomb ruler project is interesting, and it has real world value.
The new massively parallel computers are even faster than distributed.net, and those have the possibility of even greater future scaling. I think it's easier to build and coordinate a large beowulf than it is to coordinate a few tens of thousands of hobbyists. Throw hacking and the occaisional/inevitable corrupting of projects with bad data, and it becomes apparent that scaling of these distributed.net projects is very difficult. I'm not saying that it can't be done, but for a few million dollars you can build yourself a computer faster than distributed.net. If you were an oil company or a scientist working on a meaningful problem, which direction would you take?
If tits were wings it'd be flying around.
Yes. The idea that someone will actually make lots of money off of selling their cpu time on thousands of uncontrolled machines and a non-confirmed amount of participants who might leave at a drop of a hat, hell, they might even be competitors who are trying to skim out data for their own purposes. It just dont make sense. Why would you get payed, even in micropayments, for this? Why whould you want to turn your computer into a profit battleground, where the only thing that matters is that you made a buck a week on something using all of the resources instead of helping the priceless goals of the common good?
Also, where the *heck* do businesses have massively parallel problems in everyday life. this is a *very* specialized thing. I just dont see it coming.
-- dieman - Scott Dier
Fight Spammers!
Then, they turned around and encouraged others to apply 3rd party applications to restore "eye and ear candy" to the client.
Sounds like it was just a terrible thing to do.
The GUI client wasn't just the CLI with a GUI wrapper. It was a whole 'nother fork to the client mix. It was complicated, it was slow(er) and it caused many delays in the rollout of Win32 clients.
In Soviet Russia...michael would be rotting in Siberia!
There are very good reasons to not open source the code and they are all outlined on the website (at least they were last time I checked).
In Soviet Russia...michael would be rotting in Siberia!