Collaborative Map-Reduce In the Browser

← Back to Stories (view on slashdot.org)

Collaborative Map-Reduce In the Browser

Posted by kdawson on Tuesday March 3, 2009 @08:50AM from the suercomputer-on-the-very-cheap dept.

igrigorik writes "The generality and simplicity of Google's Map-Reduce is what makes it such a powerful tool. However, what if instead of using proprietary protocols we could crowd-source the CPU power of millions of users online every day? Javascript is the most widely deployed language — every browser can run it — and we could use it to push the job to the client. Then, all we would need is a browser and an HTTP server to power our self-assembling supercomputer (proof of concept + code). Imagine if all it took to join a compute job was to open a URL."

23 of 188 comments (clear)

Random Thoughts by AKAImBatman · 2009-03-03 08:51 · Score: 5, Interesting

Two comments:
1. He places the map/emit/reduce functions in the page itself. This is unnecessary. Since Javascript can easily be passed around in text form, the packet that initializes the job can pass a map/emit/reduce function to run. e.g.:

var myfunc = eval("(function() { /*do stuff*/ })");

In fact, the entire architecture would work more smoothly using AJAX with either JSON or XML rather than passing the data around as HTML content. As a bonus, new types of jobs can be injected into the compute cluster at any time.
2. Both Gears and HTML5 have background threads for this sort of thing. Since abusing the primary thread tends to lock the browser, it's much better to make use of one of these facilities whenever possible. Especially since multithreading appears to be well supported by the next batch of browser releases.
(As an aside, I realize this is just a proof of concept. I'm merely adding my 2 cents worth on a realistic implementation. ;-))

--
Javascript + Nintendo DSi = DSiCade
1. Re:Random Thoughts by maxume · 2009-03-03 12:45 · Score: 3, Interesting
  
  I found this somewhat startling:
  http://code.google.com/p/doctype/wiki/ArticleHereComesTheSun
  If you create a javascript object named 'sun' (or several other names), netscape and family (including firefox) load java into memory.
  
  --
  Nerd rage is the funniest rage.
Botnet by ultrabot · 2009-03-03 08:51 · Score: 3, Insightful

Imagine how much *spam* you could send using this approach.
No, wait...

--
Save your wrists today - switch to Dvorak
1. Re:Botnet by MonoSynth · 2009-03-03 08:58 · Score: 5, Insightful
  
  With ever-increasing JavaScript performance, there's a lot of cpu power available for cracking passwords and captcha's... Just include the code in an ad and you're done. No tricky installs needed, just the idletime of the user's web browser.
Join compute cloud by Imagix · 2009-03-03 08:51 · Score: 4, Insightful

We already have that. See botnets.
BOINC by Chabo · 2009-03-03 08:55 · Score: 4, Insightful

If you were really interested enough to donate your CPU cycles, is it really that much harder to install BOINC, and get a job running?
Plus then you can run native code instead of having to run in [shudder]Javascript[/shudder].

--
Convert FLACs to a portable format with FlacSquisher
Noscript by sakdoctor · 2009-03-03 08:56 · Score: 4, Informative

Progress is running less JavaScript, not more.
1. Re:Noscript by OzPeter · 2009-03-03 09:03 · Score: 5, Funny
  
  Sir, I have the '80s on hold on the phone at the moment. They want to know if you want to by some stuff called .. umm .. hang on .. yes here it is .. "static HTML pages" ..
  
  --
  I am Slashdot. Are you Slashdot as well?
2. Re:Noscript by wirelessbuzzers · 2009-03-03 09:10 · Score: 4, Insightful
  
  Actually it was the '90s, but whatever. The thing is, non-DHTML web pages are actually pretty good for most things... what made those early '90s web pages so awful was no CSS, slow connections, and the fact that people really didn't know how to design for this new medium.
  Probably 99% of the web still shouldn't need Javascript or flash, though pages usually do need to be dynamic on the server side.
  
  --
  I hereby place the above post in the public domain.
Would this work for music? by Anonymous Coward · 2009-03-03 08:56 · Score: 4, Funny

You could also use this to index the MP3 files on everybody's hard drives, then share the music just by visiting a URL!! ... oh wait...
Why? Why? WHYWHYWHYWHY??? by wirelessbuzzers · 2009-03-03 09:03 · Score: 5, Insightful

Javascript really isn't suited for this kind of thing, even with worker threads, for two reasons I can think of. First, web clients are transient... they'd have to report back often in case the user clicks away.
But more importantly, Javascript just isn't a good language for massive computation. It only supports one kind of number (double), has no vectorization or multicore capabilities, has no unboxed arrays, and even for basically scalar code is some 40x slower than C, let alone optimized ASM compute kernels. (This is for crypto on Google Chrome. Other browsers are considerably slower on this benchmark. YMMV.)

--
I hereby place the above post in the public domain.
Link by Jamamala · 2009-03-03 09:06 · Score: 5, Informative

for those like myself that had no idea what MapReduce was:
http://en.wikipedia.org/wiki/MapReduce
1. Re:Link by MarkGriz · 2009-03-03 09:24 · Score: 5, Insightful
  
  Thank you. Nice that we have "volunteer" editors, since slashdot doesn't seem to employ them any longer.
  
  --
  Beauty is in the eye of the beerholder.
2. Re:Link by Logic+and+Reason · 2009-03-03 09:48 · Score: 3, Funny
  
  Thank you! I had completely forgotten how to use Google!
3. Re:Link by Anonymous Coward · 2009-03-03 10:09 · Score: 4, Funny
  
  So had I; here's a link: http://www.google.com
MapReduce fanboyism by Estanislao+Mart�nez · 2009-03-03 09:13 · Score: 5, Insightful

Oh, please, make the MapReduce fanboyism stop.
Yes, it's a neat technique. It's also very old and obvious. Google's implementation is also good, but this stuff is just not rocket surgery. It's just a simple pattern of how to massively parallelize some types of computational tasks.
But somehow, just because some dudes at Google wrote a paper about it, it's become the second coming of Alan Turing or something among some silly folks. Hell, a couple of weeks ago somebody was saying on the comments here that MapReduce was a good alternative to relational databases. Now that is silly.

--
Are you adequate?
1. Re:MapReduce fanboyism by Anonymous Coward · 2009-03-03 09:22 · Score: 3, Funny
  
  listen to him everyone, he must know what he's talking about since I don't know what rocket surgery is.
Bandwidth and Exercise by clinko · 2009-03-03 09:20 · Score: 3, Insightful

A common mistake in multi-server builds is that bandwidth is free.
Bandwidth Costs Money and Time. Both are reduced by having the network closer to the processing. This is one of the reasons google bought all that "dark fiber" left around after the .com bust.
Another flaw is that computation of data is difficult to provide "good results" in blocks unless they're doing relativity matrices (Think PageRank).
Something to think about:
If I'm sending names to your pc, what can I derive from that list without having the entire list?
Pay Me by Doc+Ruby · 2009-03-03 09:21 · Score: 4, Interesting

If there were a couple-few or more orgs competing to use my extra cycles, outbidding each other with money in my account buying my cycles, I might trust them to control those extra cycles. If they sold time on their distributed supercomputer, they'd have money to pay me.
As a variation, I wouldn't be surprised to see Google distribute its own computing load onto the browsers creating that load.
Though both models raise the question of how to protect that distributed computing from being attacked by someone hostile, poisoning the results to damage the central controller's value from it (or its core business).

--
--
make install -not war
Scripts taking too long by Anonymous Coward · 2009-03-03 09:23 · Score: 4, Funny

Is this why my browser keeps telling me scripts on the slashdot main page are taking too long and do I want to stop them for the last few months?
Re:Why? Why? WHYWHYWHYWHY??? by Instine · 2009-03-03 09:42 · Score: 4, Insightful

and you don't think you could get 100 times more users to visit your web app than you could convince to download and install an exe?

--
Because you can - or because you should?
Re:A bunch of problems by AKAImBatman · 2009-03-03 10:08 · Score: 3, Insightful

Further down in the Slashdot comments, a poster also pointed out that Javascript is a poor platform for computationally intensive work. Which I agree with on a general level. The Javascript number system is designed for genericity, not performance.
In the end this is just a cute idea that has any number of practical problems. Many of them reflect the fact that distributed computing is hard, but many of them also reflect the fact that the suggested platform is less than ideal for this function. Especially if you're going to be pushing workloads that take more time and resources to transmit back and forth than to simply compute them.
Doesn't stop me from humoring him, though. We all have to dream. ;-)
And besides, this may just inspire the next fellow down the line to use the technology for a more practical purpose.

--
Javascript + Nintendo DSi = DSiCade
Re:Why? Why? WHYWHYWHYWHY??? by Nebu · 2009-03-03 10:25 · Score: 3, Insightful

Javascript really isn't suited for this kind of thing, even with worker threads, for two reasons I can think of. First, web clients are transient... they'd have to report back often in case the user clicks away.
I don't see why web clients being transient is a problem. The whole point of the MapReduce algorithm is that each worker (the web clients in this case) don't need to know anything about what the other worker is doing, what the system as a whole is doing, nor what it had done with any past job.