Could Wikipedia Become a Supercomputer?
An anonymous reader writes "Large websites represent an enormous resource of untapped computational power. This short post explains how a large website like Wikipedia could give a tremendous contribution to science, by harnessing the computational power of its readers' CPUs and help solve difficult computational problems." It's an interesting thought experiment, at least — if such a system were practical to implement, what kind of problems would you want it chugging away at?
all that guru meditation is bound to bring enlightenment 0.18
There is already existing infrastructure and projects where people can donate their system's computational power: http://boinc.berkeley.edu/
Uh, Linux geek since 1999.
"what kind of problems would you want it chugging away at?"
Well obviously the Answer to Life, the Universe and Everything
google SETI sometime. It was successful there because all involved had a common interest, NOT SO for wiki, so I'll put my money on 'flop'.
Only if bitten by a radiaoactive calculator!
Great minds think alike; fools seldom differ.
Wikipedia is a clusterfuck of little tiny fiefdoms. And you expect them to solve actual problems? hahahahaha.
...mining bitcoins.
Wikipedia could use that computing power to harvest bitcoins so that they'll never have to beg for money again. It's a brilliant plan.
I would prefer if wikipedia remained free to use. besides, javascript is 'evil'
Sure, it would come up with a solution pretty quickly, but then that solution would get edited, then the edit would be attacked by the supercomputer's moderating subroutine, then there would be a flame war on the discussion page occupying a large percentage of the total cycles. Then the solution would be locked and you couldn't see it or see a graph of it because there was no graph of it in the public domain.
If Slashdot were chemistry it would look like this:Cadaverine
"Clusterfuck of little tiny fiefdoms." That has to be the best description of wikipedia that I've ever heard.
The higher the technology, the sharper that two-edged sword.
Easy, wikipedia will use user's computational power to mine bitcoins. In this way they won't need any donations. Just wait.
Afterall it is now being done with a rather blunderbus approach. With all that extra processing power we could target people so much more effectively.
Is there a javascript bitcoin generator that can make money for me?
I could then inject the script into many websites by exploiting XSS vulnerabilities!
I wouldn't be surprised if zynga games generate bitcoin for them already or turn your computer into a general purpose compute resource for them.
This could be illegal under the UK's Computer Misuse Act unless specifically opted into. This also triggers the Data Protection Act and EU law which effectively means the browser is by default is opted out of this and signing up requires clear consent so no burying it in 1000 pages of bullshit.
Just because an entity is a charity doesn't give it special rights.
Just think -- wikipedia can be changed in a few seconds by any schoolkid with an idea for some online graffiti -- would you want it chugging away at _any problems at all_?
-wb-
if i want to contribute computing power somewhere for free then there are ways to do it already
if wikipedia needs money, i can donate something or pay something.
But *please* i use wikipedia often, maybe primarily, on my tablet. I dont think that abusing an ARM processor running on Battery power connected via an instable and slow internet connection will help a lot.
i have a car.
i have a rocket engine.
what's the problem?
Firefox is already slow enough. This would result in lots of angry Wikipedia users who don't use NoScript. Moreover, the extensive use of Javascript recently is growing out control. I have to agree more and more with rms here.
Besides:
While Wikipedia's visitors read Wikepedia's entries, the CPUs of their computers are almost idle.
What make you think that this is the case?
PluraProcessing has a cloud computing platform like the idea in this article. Customers pay Plura to perform computations and Plura outsources the computations to the browsers that are visiting its affiliate's websites. This is an interesting way to monetize the Web. Would you rather view ads or rent off some of your CPU / memory?
Let me tell you a little story:
Once upon a time, shortly after an asteroid impact wiped out the vacuum tubes; but before Steve Jobs invented aluminum, we had computers that plugged into the wall, with CPUs that ran all the time at pretty much the same power level. Even when idle. Back in those days, had most people's schedulers not kind of sucked, there may actually have been some "free" CPU time floating about.
Now, back to the present: On average, today's computer has a pretty substantial delta between power at full load and power at idle. This is almost 100% certainly the case if the computer is a laptop or embedded device of some kind(which is also where the difference in battery life will come to the user's notice most quickly). CPU load gets converted into heat, power draw, and fan noise within moments of being imposed.
Now, it still might be the case that wikipedia readers are feeling altruistic; but, if so, javascript is an unbelievably inefficient mechanism for attacking the sort of problems where you would want a large distributed computing system. A java plugin would be much better, an application better still, at which point you are right back to today, where we have a number of voluntary distributed computing projects.
If they wished to enforce, rather then persuade, they'd run into the unpleasant set of problems with people blocking/throttling/lying about the results of/etc. the computations being farmed out. Given wikipedia's popularity, plugins for doing so in all major browsers would be available within about 15 minutes. Even without them, most modern browsers pop up some sort of "a script on this page is using more CPU time than humanity possessed when you were born to twiddle the DOM to no apparent effect, would you like to give it the fate it deserves?" message if JS starts eating enough time to hurt responsiveness.
In summary: Terrible Plan.
Figuring out..
1) How to manipulate gravity so reactionless air-cars and spaceships are possible. As well as gravity on the spaceships as they travel. ...
2) Enviromentally clean, cheap, abundant and easily mass-produced energy
3) FTL velocity for spacecraft (we need a way around the speed of light barrier).... we can't live in the cradle of humanity much longer
... against entropy.
There is plenty of raw computing power. Take BOINC for example: if you look at the projects, there is very little exciting. Seti@Home has been running for ages, you can do protein folding, you can do some mathematics that it interesting but hardly revolutionary. More computing power leads to marginally better weather forecasts. NP-complete problems will not yield to computing power - you only get a tiny bit farther.
I'll be interested to see if any /.ers can propose genuinely significant problems that would be solvable by a 100fold or even 1000fold increase in processing power.
Enjoy life! This is not a dress rehearsal.
Most likely, a system like this is so inefficient in terms of network usage vs potentional computational power plus added administrative overhead, that it would only be wasted bandwidth and electricity and netto only harmful on a macro scale.
Better have the wikipedia servers, and other datacenters, run some boinc when idling. But they won't do that cause it's directly translated to the electricity bill. Network and cpu power are cheap, but still not free, and cpu's make up a large part of that power bill especially when used.
I do however like the general idea though, of 'giving some useful cpu time back' as thanks for using a free service. For example, as alternative for the now-common advertise system. As long if it could be done efficient, i see no objection at all, but i'm afraid it can't, at least not on a 'per-web-request' microscale.
A glitch a day keeps the bugs away.
Bitcoins could then have the credibility they deserve! [Citation needed]
farming!
-ducks-
don't forget where the solution is declared copyright by sony and your edits get "Suppressed" so that the history log is wiped.
Q: Will wikipedia become a supercomputer?
A: It turns out that there are stupid questions.
If you were blocking sigs, you wouldn't have to read this.
I have a better idea!
Instead of resorting to nuclear power, think of the untapped resource of the common household hamster!
All those wheels, spinning and turning - all that energy going to waste! Every hamster owning house should have a miniature turbine inside it, powered by the hamster. Think of the energy it'll generate! Why, after only a year, your single solitary hamster will probably have generated enough power to power a lightbulb for a few minutes! Assuming your hamster lives that long.
+1 IDisagreeSoHeMustBeATrollOrAnAstroturferOrAShill
Unused CPU capacity is not free to utilize. A CPU under load consumes much more power, so who is going to pay for that?
While you're in the movie, someone else could drive your car around! You aren't using it, and the gas is already paid for!
While you're at work, we could use your house for storage!
Or while you're waiting in line to checkout, you could stock shelves!
Not with JavaScript, but with Java. Using Java Applets is an old idea for implementing automatically loaded website-based distributed computing. Although, I haven't seen these clients anymore in a long time, so maybe the idea wasn't received so well.
I would get LulzSec to see if they can hack it :)
Skynetipedia jokes...
Or run Folding@Home and help cure cancer.
My CPU's are already used up scanning for malware.
Table-ized A.I.
Such workout machines would pay for themselfs as people use them.
Nope.
Welcome to the new paradigm. Das Tubes moved your cheese.
Try this:
Employ researchers to correct Three Mistakes Per Article.
Anything so hoplessly confused not to survive that metric gets tagged as Start Over.
I'd much rather broken information on any topic than elite info on more than seven topics.
My first Journal Entry ever, in 8 years! http://slashdot.org/journal/365947/aphelion-scifi-fantasy-horror-poetry-webzine
Article: "While Wikipedia's visitors read Wikepedia's entries, the CPUs of their computers are almost idle."
Assumptions, Assumptions. How do they know? Personally I do tons of stuff and I use computers several years old - I notice if a web-page starts to kill my CPU and I quickly kill it.
The users CPU is the users. Not the website's, they don't have the right to take it over without asking, no matter how altruistic the cause is.
Why not ask the user for permission? Well, if your going to do that, why not just prompt users to download and install any of the many other programs that use Idle CPU time for good causes? They could use an idle CPU much more efficiently than some Javascript on a webpage could.
The idea is fine, but honestly, unauthorized use of MY computer, even for a good cause, is going to get me the hell away from you and a very sternly-worded email sent.
Had you asked first, I might have agreed. But now? Nuh-uh.
http://hackaday.com/2009/03/03/distributed-computing-in-javascript/ - nothing new under the sun! However, to all those who say "it would be way too slow in JavaScript", I refer you to the entire OS in browser (previously on slashdot) http://bellard.org/jslinux/
There is NOT enough raw computing power, and there is certainly NOT enough that is available to those who could make use of it.
I am a scientist who is lucky enough to have unfettered access to one of the top 100 supercomputers on the planet (http://www.top500.org/), and I'm STILL limited computationally. Most researchers don't have access to a thousandth of this resource. I know that the modeling & simulation field is also computationally limited. Neither field is bumping up against NP problems, just very large ones. Luckily, they are often trivial to parallelize. If you like the fruits of science, there are a small army of researchers (hobbyist and professional) whom you could help with their significant problems.
As I see it, the problem is in the gatekeeper design of the volunteer systems (like BOINC). For many problems, it wouldn't be worth it to apply to BOINC, and try to motivate enough volunteers for a one-off run that would only take a few days on their system. Also, an entire infrastructure would need to be ported to run under BOINC.
There are solutions to this problem. A cloud (I apologize for using the buzzword), where a visualized environment would be downloaded by volunteers once, and join into a cluster where vetted researhers can run arbitrary code. Then researchers who have problems that could be run in hours to days on a system like BOINC, but not in years on their own systems could just log into the head node and launch their jobs. Several groups have most of the infrastructure built (CloVR / Science Clouds / Nimbus and Magellan / Eucalyptus), but the volunteer aspect is lacking.
To get back to the original post, would someone like to port Nimbus to run in the browser, and then load it on the non-mobile wikipedia?
cpubox
I wouldn't put it past some bureaucrat to think it's ok to use the mass population's computers to accomplish some task. So I not do apt to discharge the story.
The article mentioned in the original post explicitly said "websites like Wikipedia". Why are all the comments aimed at Wikipedia. The poor sods have a hard time as it is. Someone mentiones them as an EXAMPLE and everyone here is worried about their electric bill...
sigo ergo sum
There's a company that provides this as a revenue stream for flash game makers.
...only if that super computing power is used as a substitute for those falsity-mongering monkeys contributing `knowledge' under the egidy of that covetous W(h)ale(s). They actually dumbify the masses by their interactions.
Of course the problem with that is that in less than 100 years we went from blubbering all our communications into space to near silence, and we should assume that others would make similar leaps. Right now we could use it for our biggest threat: Compiling data on asteroids and comets to find those which are most at threat to earth. We could use it to monitor solar flare activity and magnetic field fluctuations on a planetary scale. Or we could use it to help make a larger scale model of the earth to help predict climate and plate tectonics
I would want to see scientific problems that the website publishers could solve for money distributed to the website consumers. That way sites like Wikipedia could fund their operation scaled to their audience.
Indeed I'd like to see a cross-website distributed credit accumulate, so I could purchase from websites against my accumulated credit from my computing on their behalf. Websites that split with me fairly, say 50-50%, their revenue from my computation would get my preferred business, weighted against their intrinsic value. Eg. I'd be more likely to read the same news story published by news sites that paid me more of what they got from my distributed computation.
--
make install -not war
"what kind of problems would you want it chugging away at?" evolutionary prediction. my moneys on 'Idiocracy' as an end answer.
In order for our computers to compute the cure to cancer; complex calculations would have to be parallizeable, then Wikipedias would have to divide the problem in smaller chunks, decide who gets which chunk, send it over the Internet (traffic that must be routed by computers), some computers will never return an answer because Adobe reader crashed a browser or any other reason - so Wikipedia needs to keep track of the subproblems that were not sent back and sent them to someone else, and then Wikipedia would have to make sense and synchronize all the asynchronous subproblems. All of these steps need computing power and given that the subproblems are supposed to be rather easy on our CPU, I'd guess that the overhead of wikipedia is greater than the gain. It wouldn't turn Wikpedia into a Supercomputer, it would require Wikipedia to become a supercomputer.
Been there done that, maybe want to consult Linux Cluster Urban Legends before you continue down this path
HPC for Primates. Read Cluster Monkey
Well, one person started to, then kinda went on a weird
other-topic rant.
The biggest issue, which makes this entire idea, sound
pretty worthless... for the majority of Wikipedia users, I
presume and have no idea of a source that would vet that
or refute it? What good is 1 or two minutes of computing
time?
Even the longest articles I might read on there are barely
5 minutes for me. I am a quick reader though.
Do many users 'stay' on the site for extended periods of
time? I honestly have never researched anything for any
long stay. If I need to do serious research. No offense,
Wikipedia, but you are not going to be the source.
I guess you can break down the work, or only schedule
work that can be broken down into 1 minute chunks,
you could dole out work units based on the length of
article, with a maximum of maybe 3-5 minutes (the
average attention span) so when someone gives up on
"all the words" the work isn't lost.
Then you get into, how long is the download time of the
chunks. Will that be affected throughout the day, as
server latency scales up and down? Or localized traffic
scales up and down? That eats into compute time, since
you have to send the work unit back. Which may be an
order of size more.
Next point... why not just create the "Wikipedia Distrubuted ... because then it would be just like all the others and then
Computer Project" and have frequent (or whomever) users
download a client and run it...
you see why the answer to this is...
1) Yes, Wikipedia could become a supercomputer.
[even though it wouldn't be Wikipedia in the sense that it was
THEIR computers.]
2) So, that makes it in a way... NO, they can't become a
supercomputer because of the feasibility, etc but they can
be a hub for a distributed network, which really isn't a
supercomputer
-AI
For me, it is far better to grasp the Universe as it really is than to persist in delusion
If wikipedia were to become self aware it might declare the entire human race non notable and a candidate for speedy deletion.
it would almost certainly turn into a supercomputer of sorts by allowing for the growth of a massive botnet.
!!.. USe it to fold proteins...
Wait... Shit.
I hit some random website (I don't remember which one) and suddenly my CPU usage pegged and the Java console popped up. The output on the console implied that a Java applet was mining bitcoins. Of course, I killed the browser process immediately.
A few years ago, I designed a Java "CPU leech" applet that would do things like this. Wasn't particularly difficult. I never actually built it; somebody else obviously did.
I wonder how many of these things are out there that are smart enough to throttle their CPU usage.
Welcome to the Turing Tarpit, where everything is possible but nothing interesting is easy.
If someone would build a browser-based distributed Hadoop + BigTable (with proper encryption and anonymization) we can have all the benefits of Google without the ads, scary corporate power, or privacy issues! I would leave my browser on their page and donate my CPU cycles and HD space. Where do I sign up?
googlebox
You don't fucking do that shit.
As long as it doesn't cause read/writes to a disk then it's fine. Also if it's running on a tablet or phone or any other kind of device where battery life is important or even low, it should detect that and throttle back. Maybe that sorta info should be accessible securely via Javascript to help enable that.
Well it's all good talking about volunteers and arbitrary code, but people don't just put coins in a plain white tin market "charity" -- we like to chose who the beneficiaries of our goodwill are. Our donations make us feel involved, and therefore good about ourselves.
And on the flipside, there are things some people won't donate to. There are many people who wouldn't be happy having their CPU used for foetal stem cell research, for example. And some who would object to anything involving animal research. The anti-nuclear lobby would be against simulating new power station prototypes. And half the world would object to having weapons research (nuclear, biological, chemical or conventional) carried out on their PCs.
"Something for everyone" often goes hand in hand with "something against everyone", and rather than having an additive effect on the pool of volunteers, it has a subtractive effect.
HAL.
Got them moderator blues I blieve I walk out the do', With these mod-points I been gettin', I 'most never post no mo'
I have been considering the possibilities of this, but without properly implemented WebGL, any sort of "supercomputing" is quite a way off. What might be feasible, though, is offloading lots of server-side computing load to clients, and making servers mostly object stores with privilege checks or client-encrypted data. There are three problems that I can see on this, though: how to partition the "application" so that server computing and network bandwidth needs are minimized while data security is not compromized, how to generate the client side code without succumbing to Javascript hell, and what would be computation-hungry applications that could benefit from this? Especially balance of bandwidth cost and usefulness of the results seem a challenge. After thinking quite a while, I didn't really come up with meaningful problems that this solution would solve...
Offloading as much site content on client side on caches and Web storage would probably be much more beneficial to both the user and the site operator. But hell, that's not really at all as sexy, or require anything as technically fanciful.
One megabyte of transfer with Web storage munchable sizes (20 MB) from Amazon CloudFront costs about 0.675 millicents. User-initiated writes cost at least 10 millicents per megabyte. A second of 2.4 GHz Xeon core use costs about 0.015 millicents at EC2, at cheapest. If I calculated everything right, one megabyte of data transfer equals in price of 4.6 to 68 seconds of computation on the server side. Other way it could be stated that it's worth to transfer extra 20 MB of data to the client if over 90 seconds of server side computing can be avoided, and transferring same amount extra from client to server saves over 1350 seconds of server time, it is worth it.
What tasks would make sense to offload tasks to clients in this way, also considering slower computing under Javascript, and thus higher latencies of getting the job done? Note that making more HTTP requests than a single 20 MB one would make offloading less cost-efficient. To me it seems that only highly cacheable (let's say over 99% hit rate, leading to computing times that don't make user with that Javascript engine completely desperate) datasets make this meaningful, and when this is the case, work done on the client side consists mostly of page compositing. Not so fancy, really - wouldn't call it supercomputing...
Also, it might be worth noticing that at cost of couple developers necessary to maintain an bleeding-edge architecture like this it's possible to buy over thousand continuously running cores as outlined above. That already packs quite a punch, in comparison to benefits that might be achievable by developing a highly complex platform to offload computation to clients. It's not likely that many sites would *really* need that.
Let us be serious for a minute.
That kind of thing would require being able to harness serious computing power from within a web browser.
Web browsers are already struggling not to fall over, consume all of your RAM and crash your GPU doing nothing at all.
Performing some computation while loading Wikipedia pages would need to be done with Javascript, which is arguably one of the slowest programming languages ever. Even the latest JIT can barely make it play mp3 in real-time using the latest high-end PC.
That computation would need to be intensive enough so as to justify the costs of sending/retrieving the data.
It's just not going to work, and even if it could work, it wouldn't be practical. If I want to access information, I want to access it as fast as possible, not making my computer sluggish (which a lot of Javascript and/or Flash seems to do for unknown reasons) with some computation. I would end up just filtering it like I filter advertisements.
Wikipedia is a clusterfuck of little tiny fiefdoms.
If Wikipedia has come off this way to you, then perhaps you've run into one too many editors who routinely violate the policy against treating an article as a "fiefdom". If you can't resolve a problem with an article through the typical BOLD, revert, discuss cycle, other dispute resolution mechanisms are available.
[Wikipedia's] users are already used loading a clean site without too many ads (except for various fundraising ads)
Aren't all ads "fundraising ads"?
This definition actually applies to facebook and probably all people who thinks like you
Hands off my useware. And stay off my lawn!
Probably much more effective to utilise spare capacity of datacentres and server farms.
For in-browser crunching it would be straightforward to implement this in Javascript. As soon as the page loads it starts crunching data in your browser. Not as efficient as native code but it would be easy enough to get something crude working. Given enough clients this would be an effective supercomputer.
1) Use the resulting supercomputer to simulate a neural network.
2) ???
3) Call it "Skynet"
After logging in slashdot still does not take you back to the page you were on. It's been that way for 20 years.
It's been covered before http://tech.slashdot.org/story/09/03/03/1910207/Collaborative-Map-Reduce-In-the-Browser
Analyzing chess positions is one possible project that could utilize the method described in this article. For example, the Open Encyclopedia of Chess Openings, which encourages users to contribute computer analysis, would stand to benefit.