Collaborative Map-Reduce In the Browser

← Back to Stories (view on slashdot.org)

Collaborative Map-Reduce In the Browser

Posted by kdawson on Tuesday March 3, 2009 @08:50AM from the suercomputer-on-the-very-cheap dept.

igrigorik writes "The generality and simplicity of Google's Map-Reduce is what makes it such a powerful tool. However, what if instead of using proprietary protocols we could crowd-source the CPU power of millions of users online every day? Javascript is the most widely deployed language — every browser can run it — and we could use it to push the job to the client. Then, all we would need is a browser and an HTTP server to power our self-assembling supercomputer (proof of concept + code). Imagine if all it took to join a compute job was to open a URL."

20 of 188 comments (clear)

Min score:

Reason:

Sort:

Botnet by ultrabot · 2009-03-03 08:51 · Score: 3, Insightful

Imagine how much *spam* you could send using this approach.
No, wait...

--
Save your wrists today - switch to Dvorak
1. Re:Botnet by MonoSynth · 2009-03-03 08:58 · Score: 5, Insightful
  
  With ever-increasing JavaScript performance, there's a lot of cpu power available for cracking passwords and captcha's... Just include the code in an ad and you're done. No tricky installs needed, just the idletime of the user's web browser.
Join compute cloud by Imagix · 2009-03-03 08:51 · Score: 4, Insightful

We already have that. See botnets.
BOINC by Chabo · 2009-03-03 08:55 · Score: 4, Insightful

If you were really interested enough to donate your CPU cycles, is it really that much harder to install BOINC, and get a job running?
Plus then you can run native code instead of having to run in [shudder]Javascript[/shudder].

--
Convert FLACs to a portable format with FlacSquisher
1. Re:BOINC by Chabo · 2009-03-03 10:06 · Score: 2, Insightful
  
  A big thing is the same thing people have against VB: there may not be anything technically wrong with it, but bad programmers are drawn to it because it's easy, so you hardly ever see a good VB program. There's especially nothing wrong with VB now, when writing a program in VB.NET gets you the same result as if you'd written it in C#: you still get CIL code when it's compiled.
  However, Javascript gets used for way too much, and historically it's been a huge browser security issue. Even if you use it responsibly, that doesn't mean everyone does.
  
  --
  Convert FLACs to a portable format with FlacSquisher
Why? Why? WHYWHYWHYWHY??? by wirelessbuzzers · 2009-03-03 09:03 · Score: 5, Insightful

Javascript really isn't suited for this kind of thing, even with worker threads, for two reasons I can think of. First, web clients are transient... they'd have to report back often in case the user clicks away.
But more importantly, Javascript just isn't a good language for massive computation. It only supports one kind of number (double), has no vectorization or multicore capabilities, has no unboxed arrays, and even for basically scalar code is some 40x slower than C, let alone optimized ASM compute kernels. (This is for crypto on Google Chrome. Other browsers are considerably slower on this benchmark. YMMV.)

--
I hereby place the above post in the public domain.
Re:Noscript by wirelessbuzzers · 2009-03-03 09:10 · Score: 4, Insightful

Actually it was the '90s, but whatever. The thing is, non-DHTML web pages are actually pretty good for most things... what made those early '90s web pages so awful was no CSS, slow connections, and the fact that people really didn't know how to design for this new medium.
Probably 99% of the web still shouldn't need Javascript or flash, though pages usually do need to be dynamic on the server side.

--
I hereby place the above post in the public domain.
MapReduce fanboyism by Estanislao+Mart�nez · 2009-03-03 09:13 · Score: 5, Insightful

Oh, please, make the MapReduce fanboyism stop.
Yes, it's a neat technique. It's also very old and obvious. Google's implementation is also good, but this stuff is just not rocket surgery. It's just a simple pattern of how to massively parallelize some types of computational tasks.
But somehow, just because some dudes at Google wrote a paper about it, it's become the second coming of Alan Turing or something among some silly folks. Hell, a couple of weeks ago somebody was saying on the comments here that MapReduce was a good alternative to relational databases. Now that is silly.

--
Are you adequate?
I stopped at... by greymond · 2009-03-03 09:17 · Score: 2, Insightful

"Javascript...â" every browser can run it..."
There is a huge difference between being able to run javascript apps and run javascript apps well - not to forget that a lot of the javascript I see out there really only works on PC's with IE or Firefox, Opera and Safari, especially on OS X seem to have trouble with some sites that aren't coded for compatibility, but instead pushed out quickly with little regard for anything other than IE on Windows.

--
Ave Molech Setting
Bandwidth and Exercise by clinko · 2009-03-03 09:20 · Score: 3, Insightful

A common mistake in multi-server builds is that bandwidth is free.
Bandwidth Costs Money and Time. Both are reduced by having the network closer to the processing. This is one of the reasons google bought all that "dark fiber" left around after the .com bust.
Another flaw is that computation of data is difficult to provide "good results" in blocks unless they're doing relativity matrices (Think PageRank).
Something to think about:
If I'm sending names to your pc, what can I derive from that list without having the entire list?
Re:Link by MarkGriz · 2009-03-03 09:24 · Score: 5, Insightful

Thank you. Nice that we have "volunteer" editors, since slashdot doesn't seem to employ them any longer.

--
Beauty is in the eye of the beerholder.
Self-defeating idea by whizbang77045 · 2009-03-03 09:26 · Score: 2, Insightful

This seems to me a self-defeating idea. The obvious goal is to get more processing power. Yet using a scripted language is inefficient, and a waste of processing power. If you want more processing power, you need to group computers of the same general instruction set, and which can run compiled (or, dare I say it?) assembled machine code.
Rather have a cold PC by Wee · 2009-03-03 09:37 · Score: 2, Insightful

My CPU time isn't idle. It's keeping my laptop from being too hot to touch and too noisy to work on. And there's no reason to pay more for electricity than I already do.

-B

--
Ash and Hickory, straight-grained and true, make excellent bludgeons, dandy for the cudgeling of vegetarians.
Re:Why? Why? WHYWHYWHYWHY??? by Instine · 2009-03-03 09:42 · Score: 4, Insightful

and you don't think you could get 100 times more users to visit your web app than you could convince to download and install an exe?

--
Because you can - or because you should?
A bunch of problems by Briden · 2009-03-03 09:51 · Score: 2, Insightful

best comment on TFA:

I think this approach to MapReduce is a pretty creative angle to take on it. However, there are a number of distributed systems-type problems with doing it this way, that would need to be solved to actually make this realistically possible:
1) The dataset size is currently limited by the web server's disk size.
Possible solution: push the data to S3 or some other large store.
2) There is a single bottleneck/point-of-failure in the web server. In theory 10,000 clients could try to emit their map keys all at once to the web server. IIRC, Google's mapreduce elects nodes in the cluster to act as receivers for map keys during the map/sort phase.
Possible solution: Again, if you were using S3, you could assign them temporary tokens to push their data to S3 -- but that would be a large number of S3 PUT requests (one per key).
3) Fault-tolerance -- what happens when a node in the browser compute cluster fails for any of N reasons? How does the web server re-assign that map task? You'd especially want to ensure that computation finishes on a job in an unknown environment such as 1,000,000 random machines on the internet.
Possible solution: If you haven't heard from a node in N seconds, you could reassign their map task to someone else. This is a similar idea to the MapReduce paper's description of sending multiple machines on a single map task, and racing them to the finish.
4) Security -- there is no way to deterministically know whether the data emit()ed from a user's browser session is real or not. How do you trust the output of 1,000,000 users' Javascript browser executions (I think the answer is, you don't).
1. Re:A bunch of problems by AKAImBatman · 2009-03-03 10:08 · Score: 3, Insightful
  
  Further down in the Slashdot comments, a poster also pointed out that Javascript is a poor platform for computationally intensive work. Which I agree with on a general level. The Javascript number system is designed for genericity, not performance.
  In the end this is just a cute idea that has any number of practical problems. Many of them reflect the fact that distributed computing is hard, but many of them also reflect the fact that the suggested platform is less than ideal for this function. Especially if you're going to be pushing workloads that take more time and resources to transmit back and forth than to simply compute them.
  Doesn't stop me from humoring him, though. We all have to dream. ;-)
  And besides, this may just inspire the next fellow down the line to use the technology for a more practical purpose.
  
  --
  Javascript + Nintendo DSi = DSiCade
Re:Another way to hijack a browser? by Darkness404 · 2009-03-03 10:13 · Score: 2, Insightful

But, you can just close the browser or type "killall firefox" and the program dies and you have to go to the URL again to get to it. So, though this is bad for other reasons, yours just isn't one of them.

--
Taxation is legalized theft, no more, no less.
Re:Why? Why? WHYWHYWHYWHY??? by Nebu · 2009-03-03 10:25 · Score: 3, Insightful

Javascript really isn't suited for this kind of thing, even with worker threads, for two reasons I can think of. First, web clients are transient... they'd have to report back often in case the user clicks away.
I don't see why web clients being transient is a problem. The whole point of the MapReduce algorithm is that each worker (the web clients in this case) don't need to know anything about what the other worker is doing, what the system as a whole is doing, nor what it had done with any past job.
Re:Pay Me by Anonymous Coward · 2009-03-03 11:01 · Score: 1, Insightful

Start paying attention to the ads. Look for "Click here to help X and get paid ONE MILLION DOLLARS!"
Re:The future of banner ads? by Daengbo · 2009-03-03 18:26 · Score: 2, Insightful

high-volume sites such as /. could support themselves by taxing, say, 10% of a viewer's CPU with an unobtrusive background thread
What happens when I open 15 tabs at 10% each?

--
Put identity in the browser.