Harvesting & Reusing Idle Computer Cycles
Hustler writes "More on the University of Texas grid project's mission to integrate numerous, diverse resources into a comprehensive campus cyber-infrastructure for research and education. This article examines the idea of harvesting unused cycles from compute resources to provide this aggregate power for compute-intensive work."
Does anyone realize that running a CPU at 100% takes more electricity than running a CPU at 10%?
"wasted compute cycles" aren't free. I would assert they're not even "wasted".
Google's desktop search is one example where the timing and recovery back to the user is really done well.
__
Laugh daily funny adult videos
There are several non-commercial distributed computing systems, so the GridMP system isn't anything particularly new or groundbreaking. However, in companies that run very resource intensive applications and simulations, such a distributed system that uses unused CPU cycles has some serious applications.
However, the most critical aspect of this type of system is not just that the application in question is just multithreaded, but that it be multithreaded based on the GridMP APIs. To do such would require either a significant rewrite of existing code or a rewrite of it from scratch. This is not a minor undertaking, by any means.
If the performance of the application and every cycle counts, then that investment is definitely worth it.
Jesus saved me from my past. He can save you as well.
REusing idle cycles? Really?
brwski
"Because without beer, things do not seem to go as well''
are harvesting spare cycles all the time. I don't think there are much cycles left over anymore!
Oh well, what the hell...
"Compute" as an adjective is just weird. Keep your creepy clustering terms to yourself kthx
This is a very insightful post, but has two crucial counterarguments
- Does anyone realize the cost of buying extra computers to handle peak computing loads?
- Does anyone realize the cost of idle high-tech, high-paid labor while they wait for something to run?
The proper decision would balance these three (and other factors) in defining a portfolio of computing assets that can cost-effectively handle both baseline and peak computing loads. Idle CPUs aren't free, but then neither are idle people or surplus (turned-off) machines.Two wrongs don't make a right, but three lefts do.
Does anyone realize that running a CPU at 100% takes more electricity than running a CPU at 10%?
"wasted compute cycles" aren't free. I would assert they're not even "wasted".
And neither are the computer cycles reused as the slashdot article would have you believing.
How can you reuse something that was never used in the first place?
http://gridengine.sunsource.net/
Free and opensource, runs on almost all operating systems.
I thought that was what spyware was for? When you are not using your computer, and while you are using your computer too, let your computer send out e-mail and perform security audits on other Microsoft Windows computers! In exchange, you will get free, unlimited access to special money saving offers for products from many reputable companies, such as Pfizer.
Powered by caffeine and sugar; BSD
What you are saying was perfectly correct even 3 years or so ago.
But case in point: My Athlon64 computer doubles its wallplug powerdraw (including everything:PSU, Mainboard, HD, ect) at 100% load compared to idle desktop (ok, cool%quite helps pushing idle power down).
The cpu IS the biggest chunck besides some high-end GPUs (and even those need MUCH less power when idle), and modern cpus need 3-4 times as much power under full load compared to idle.
HI O WISE PRINCE. WHT TOOK U SO DAM LONG?
Could we really do this stunt? I see no reason why we could not. Dassault has done it.
Dassault, a French company, designed and tested its new Falcon 7X entirely in a virtual reality. The company did not create a physical prototype. Rather, the first build is destined for sale to the customer.
http://www.tomshardware.com/cpu/20050509/cual_core _athlon-19.html
60-100W difference between idle and full power consumption. That is not an insignificant amount of power.
PVM offers both the spec and the implementation, MPI offers a newer spec with several solid implementations. But no, NIH-syndrom prevails and another piece of half-baked software is born.
Where I work, the monstrosity uses Java RMI to pass the input data and computation results around -- encapsulated in XML, no less...
It is very hard to fight -- I did a comparision implementing the same task in PVM and in our own software. Depending on the weight of the individual computation being distributed, PVM was from 10 to 300% faster and used 5 times less bandwidth. Upper management saw the white paper...
Guess, what we continue to develop and push to our clients?
In Soviet Washington the swamp drains you.
How much energy does it take to harvest the energy?
How many cycles does it take to harvest the idle cycles?
Is the balance positive or negative?
Distributing computing processes to third parties is much more inefficient. The workload has to be distributed in smaller packets, it has to be confirmed & rechecked more often, and the same workload has to be done multiple times due to not everyone runs a dedicated machine or always has 'spare cpu cycles.'
I would agree that distributing the work load is cheaper in the long run, especially with an increase in the amount of participants, but it is not a 1 to 1 cycle comparison, and therefore it is not necessarily 'taht much cheaper', 'more efficient', or 'more prudent' for a research facility to rely on others for computing cycles.
Q: Will Sun make Java Technology Open Source? A: Sun's goal is to make Java as open as possible and available to the largest developer community possible. We continue to move in that direction through the Java Community Process (JCP). Sun has published the Java source code, and developers can examine and modify the code. For six years we have successfully been striking a balance between sharing the technology, ensuring compatibility, and considering the needs of a growing installed base of more than 2.5 million Java developers who depend on us. We are certainly evolving Java through the JCP to a model that works for all involved but that also ensures compatibility. Cross-platform compatibility has always been the key to Java's success and integrity; a notion we feel was protected by Microsoft's agreement in January 2001 to settle the lawsuit regarding Java technology.
I take it that's a 'no.'
XML UI Browser/Platform
If you have extra Macs, you can with DVD studio and Shake. Look up qmaster.
Mod point free since 2001
Heterogeneous Hardware - This is a major issue.
The kinds of things that interest high-end computing geeks tend to be extremely sensitive to round-off error.
If you're trying to get accurate results by spreading calculations around among disparate machines that might deploy e.g. IEEE 64-bit doubles, IEEE 96-bit doubles [Intel & AMD], IEEE 128-bit doubles [Sparc], or various hardware cheats [MMX, SSE, 3dNow, Altivec], then trying to make any sense of the results will drive you absolutely bonkers.
PS: A good place to start in understanding the uselessness of e.g. 64-bit doubles is Professor Kahan's site at UC-Berkeley; you might want to glance at the following PDF files:
Will running these programs make my computer less reliable later? Shorten it's productive life (2-3 years)?
I have a Dual 2.0 Mac that I leave running all the time because it's also acts as my personal web server, and because it's just easier to leave the computer on (not asleep) all the time. I run Folding@home because I believe in the science and research and know that my contributions actually help good science. But the idea of wear and tear of the machine has crossed my mind and want to know what the negatives are to doing this to the machine (besides having to pay for the electricity).
Good point. Having to rewrite the application to make use of a parallel MPI can be a pain. Condor is a free full-featured batch system that allows you to run apps on remote machines without having to recompile them
The voltage from my idle memory cycles goes through a series of capacitors and ICs to make my fancy-dancy lights blink so I won't have to buy new computers and waste power - and all of this is within a 133 MHz underclocked pentium box with 32 MB ram running linux.
I'm saving the world!
I'll be your candy shop of infinite deliciousity if you'll be my discotheque of endless rump-shaking.
Ya, let's countries such as China and N. Korea have such access to free engineering. After all, we want oppressive regimes to have as much power over their own citizens. I mean, when was the last time YOU could fly your own jet? Such gaps between non-democratic governments and it's citizens make much-needed revolutions that much harder to achive.
Life is not for the lazy.
I saw some some posters from the fraunhofer institute in germany on the subject of power, with a graph of specint/watt.
0. all modern cores switch off idle things (like the FPU) and have done for some time.
1. those opteron cores have best in class performance
2. intel centrino cores, like the i740, have about double the specint/watt figure. That means they do their computation twice as efficiently.
In a datacentre, power and air conditioning costs are major operational expenses. If we can move to lower power cores there -and have adaptive aircon that cranks back the cooling when the system is idle, the power savings would be significant. of course, putting the datacentre somewhere cooler with cheap non-fossil-fueled electicity (like British Columbia) is also a good choice.
How about we do something that's a little more pratical and useful such as finding new drugs that will cure cancer.
Err, not precisely. Intel's Pentium M can create a system that draws 132 watts at maximum CPU load, and runs nearly as fast.
I've been buying AMD for about five years, but I think my next system will be a Pentium M. Just as soon as they're a bit cheaper...
--grendel drago
Laws do not persuade just because they threaten. --Seneca
Seriously. We're talking about literally a 30 year old idea. By now it should really be built into every OS sold. The default configuration for every machine put on a network should link it into the existing network queueing system that you all have running at your sites.
Government of the people, by corporate executives, for corporate profits.
Your choices are:
Note that the solution in this article is obviously not free due to electricity and other support costs, but it is undoubtedly cheaper than buying your own cluster and then paying for electricity and the support costs.
There is increased wear and tear associated with running a computer. However - in university environments, this may not matter. At the university where I did my undergrad work, and now at the current one where I work, all general student-use computers in labs are replaced on a three-year basis. At any one time, there is a huge glut of just-barely-not-newest computers to be had. So shortening the lifespan of these machines really won't matter. The lab boxes are on most of the time anyway, and will be rotated out before they break.
The Wisconsin Condor Project has been harvesting unused compute cycles for over a decade. The software is free to use and deploy, and is used by various corporations including Western Digital and others.
Hmm, where have I heard about this before again?
Exciting to read a paper on this fanastic new idea.
Beware: In C++, your friends can see your privates!
When I'm doing pedestrian things - read anything but games, videos, or high-end graphics work - my graphics card is underutilized.
Wouldn't it be cool to utilize it to its full potential?
Even better, when the screen saver would normally in, just turn over the graphics card completely to the background process.
Imagine Seti@home running on your GPU.
PS: Ditto some other processors that aren't being used to their full capacity.
Knowledge is how to play a game, intelligence is how to win, wisdom is knowing what game to play.
Our new buses are the exact same - designed in CAD - no prototype phase - first production models were sold.
And they are shit.
Flimsy, awkward, handle like a drunken whale, weak brakes, and parts you *physically cannot get to*.
There is a very good reason for prototypes - you get to see what breaks *before* you invest in production tooling and large material and parts purchases.
They're gonna lose their ass on that...
Why can't I mod "-1 Idiot"?
So what should I have done with that CPU power?
Doesn't it make you feel good to know that our freedoms are protected by politicans, lawyers and journalists.
BURP is a project of a similar concept built off of BOINC. I'd link to it but I don't have it. Just Google it.
I am Spartacus
How JAVA's Floating-Point Hurts Everyone Everywhere
That presentation was done in 1998.
That'd be seven years ago...
Ever heard of java.lang.StrictMath? Didn't think so. Been around since Java 1.3. Current version is 1.5.
This isn't just about parallel computing - in fact if you'll read the article you'll see that they're using MPI for handling parallelism! Grid computing isn't about reinventing inter-node communications - it's more about inventing inter-node scheduling.
Your cluster - is it so fast that you're never stuck waiting for jobs to finish? If not, then you could probably benefit from being able to borrow time on someone's larger system. Is your cluster so well-utilized that the load's always around 1? If not then you've probably got spare capacity that someone else could benefit from. The fact that both you and those others are using MPI is necessary but insufficient to allow you to cooperate.
First, I completely agree with you that it does depend a lot on what you're doing. For instance, last I heard _cycle for cycle_ the Pentium is still the king of integer - such as chess. But the crown is different for flops... Raw clock is meaningless and you are highly misguided. Furthermore, MANY operations are multicycle, and I guarantee you they are used on anything mathmatically intense enough to be worth sending out over the network.
Interestingly, you don't have to leave Intel to see this: the Celeron, "vanilla" P4, Xeon and Pentium M have a lot of good differences just within the current Intel x86 line. The Pentium M is awesome, for instance.
But I'll provide just a few examples of how a cycle is NOT a cycle.
Many of these only help you if you compile for that architecture or it does something fancy in the background to compensate - but you could certainly distribute a mixed exe that ran the appropriate binary for the platform.
- First, bitwidth:
64 bit addition requires 1 cycle on a 64 bit cpu, but at least about 3 on a 32 bit. 64 bit multiplication is MUCH worse on a 32bit machine. Similarly, 128 bit vector math is much cheaper on a G4 ("altivec") than on a CPU limited to 64 or 32 bits in that arena.
- registers: A CPU can only actually DO operations on values in registers. If you have more registers you can do much more complicated (longer-chained) operations without having to go to RAM or cache. This is intensely true on highly serial but complicated math and amazingly significant if the operation data actually fits in registers in one CPU and not in another.
- branch prediction and shorter pipeline depth. All other things being equal you want the shortest pipeline possible because it means you have the lowest branch prediction penalty. Coupled with the quality of your branch predictor, this makes a big difference. (Of course, things _aren't_ equal, and longer pipelines make it easier to physically build faster CPUs) Even if branch prediction is meaningless, the pipeline depth is still important.
- parallelization: _All_ modern computers let you run some multiple commands in parallel using multiple CPUs, cores, hyperthreading and/or multiple processing units. Many computers come with two CPUs. Some newer CPUs comes with two cores. Hyperthreading decreases the process switching penalty. Modern CPUs have separate integer and flop units, often more than 1. Clearly the quantity and efficiency of these multiple units would make a big difference.
At an absolute minimum, all of these things help you run the OS without interfering too much with your actual work. But since we're talking about stuff that's already being distributed over a wide network to multiple computers, on some level this work is clearly parallelizeable. Even if your second core can't help on your first 'chunk' you could likely be executing two chunks at nearly the same speed (barring other constraints listed here)
- cache(L1/L2/L3), cache prediction, RAM, bandwidth, chipsets. I'm not going to go into all the details, but suffice to say that the cores need data and code to function and unless your entire process fits in registers, they have to get it from somewhere. The arrangement of memory has a big impact on 1) how much work the CPU has to do to get information and 2) how much the CPU has to wait for that information.
- I/O - I know this is out of our case, but the CPU efficiency of IDE has increased dramatically, but there is still some variance from system to system and driver to driver. Furthermore, different network cards/drivers use significantly different amounts of CPU time to send large amounts of data. This is true even if the speed of execution is not I/O bound - it still takes some main processor clocks and the quantity varies.
Furthermore, this arbitrary driver code and any OS code - for instance - is definitely susceptible to traditional branch prediction, cache hits, etc - even if your main crunching loop did fit in registers.
I'm sure there's more, but I'm done for now.
Looking for freelance Actionscript (Flash/Flex) or ColdFusion work and/or freelance developers. Email me, put Slashdot