Swarm — a New Approach To Distributed Computation
An anonymous reader writes "Ian Clarke, creator of Freenet, has been working on a new open source project called Swarm. The concept is to allow a computer program to be distributed across multiple computers in a manner almost completely transparent to the programmer. The system observes the program executing and figures out how the workload should be distributed for maximum efficiency. Swarm is implemented in Scala. Its at an early-prototype stage, and Ian has created a good 36 minute video explaining the concept and the current implementation."
.. was Mosix http://www.mosix.org/
It allowed mosix-running linux computers to distribute their loads over a connected other mosix-running linux computers.
Processes migrate to other nodes transparently. No programming changes were needed.
At first I thought they were talking about Swarm, a "attempt to gather up many different kinds of models that go under the heading of "agent-based modeling" and create a common language and programming approach." that I've worked with before. I'm surprised they went with the name of an established toolkit in another aspect of programming. Still, looks like a cool tool, another layer of abstraction to make distributed computing easier might make it more attractive to those that don't use it much at the moment.
"goodbye and hello, as always" ~Prince Corwin, from Zelazny's Amber series
I'm sure you would notice an apparently suspicious huge JVM process eating your CPU time. :]
Ezekiel 23:20
You know though, most people don't ever check that. They think that over time Windows just "gets slow" because hardware "goes obsolete". So when that happens they think they have to buy a new computer.
Taxation is legalized theft, no more, no less.
Imagine a Beowulf cluster of... err. Oh.
The thing that's always killed this idea (along with automatic parallelization even on the same machine) is that the overhead of figuring out what's worth distributing, and the additional overhead from mistakes (accidentally distribute trivial computations), often swamps the gains from the multiple processors banging away on it simultaneously. Determining statically what's worth distributing is very hard, since solving it properly is undecidable (basically equivalent to the halting problem), and even solving it in a significant enough subset of cases to be useful has proved difficult. It looks like this project is monitoring dynamically to determine what to distribute, which seems likely to be more fruitful, although historically that approach has suffered from the overhead of the monitoring (like always running your code with debugging instrumentation turned on).
I certainly hope he has a breakthrough vs. past approaches, or it could just be that advances in a lot of areas of technology have given him a better substrate on which to build things that naturally mitigates lots of the problems these things used to have (automatic parallelization research started probably ahead of its time, back in the 1970s, so that most academic stuff was killed off by the 1990s after no really knock-down results emerged). It's not entirely clear to me what the killer advance is, though. The particular variety of portable continuations? A good way of easily monitoring computations? Something that makes the data-dependency analysis particularly easy?
10 PRINT CHR$(205.5+RND(1)); : GOTO 10
It sounds like a good idea, but I don't think the project is far enough along in this video to warrant a posting. Maybe he was using too much of a trivial example to be appreciated in the video, but his explicitly offloading the task to another computer doesn't appear to be very far beyond standard client server models. If it were already automatically transporting processing between different nodes, it'd be much cooler, but that is not a trivial problem to solve. Deciding what should and what shouldn't be distributed at the application level will be extremely hard I imagine. If the project were farther along in its maturity I'd be much more interested.
In Ian Clarke's Swarm, World "Hellos" you!
AT&ROFLMAO
Yeah. Visit a website with an applet and you get the JVM startup and it stays up and running even after you leave the website and visit websites that don't have applets. In other words, I probably wouldn't notice at first either and I'd be chugging along until I restarted my machine and saw the JVM pop-up again for no apparent reason. Other folks who have no idea wtf a JVM is would never notice.
It's NOT me! It's the meds! I'm on 1000mg of Fukitol.
Nope, I already use OpenOffice!
AT&ROFLMAO
If I understand what he says correctly, it is something like this: Distributing computation is hard, really hard. It's so hard that nobody ever did it properly. But Swarm will change this! How? Well, we don't know yet, there are so many interresting problems we have to solve first. And you can help!
That's true to some degree. But computers do slow down as they age. Components damaged by the constant heating cause more errors and therefore require retransmission or error correction, slowing things down.
http://fox.eti.pg.gda.pl/~pczarnul/DAMPVM.html (Dynamic Allocation and Migration Parallel Virtual Machine ?
So instead of R-ing TFA, I have to WTFV? Sigh.
The CB App. What's your 20?
My Dell desktop from 1999 has been running like the wind again since last week, when I reverted it to its 2002 state from backup tape. It goes superfast now that it's virus-free, off the network, and running old apps on Windows 98.
I was only trying to recover some old files before junking an unusable machine, but I may keep it around now as a non-networked machine for the kids.
Computer 1: MOV AL...what? No more? MOV AL what? I need a value! WTF am I supposed to do with that!?
Computer 2: 09? Nine? Who gave me nine on its own. That doesn't make any sense! Jeez! Hey, anyone out there missing some data?
Computer 3: Not me, I'm pushing the registers onto the stack
Computer 4: Nope, I've got an INT
Computer 5: Oh, hey, it could be me - does NOP have a value. No? Sorry, my bad!
Computer 1: Nine - yeah, nine - Well, I could stick that in AL if no-one else wants it!?
Computer 3: Oh, heck, give it to 1. I've just got a POP instruction so I am going to obliterate it anyway.....
AT&ROFLMAO
I wasn't at all arguing that old hardware becomes unusable, just that the GGP's post seemed to say (not explicitly, granted) that slowdown is only caused by software, which isn't entirely true.
That's not really bad. It moves the PC market forward.
In my experience, Java is not the reason people buy new computers.
Their computers slow down from viruses, or virus-like Antivirus, and then they think they need to upgrade.
Lately commercially made programs (AIM? Windows Live stuff? Most printer software? Most shareware?) seem to consume as much memory as a whole JVM, despite being written in C. This has led me to conclude that companies really don't give a shit how much memory their software uses. This is quite ironically pushing Java closer and closer to C in actual memory and CPU usage.
Disclaimer: I know C is amazing when used properly - but it seems like only small FOSS projects and apps destined for phones have any sort of optimization work done. I've seen daemons use 200KB on a tiny linux handheld, but multiple megabytes is the norm on any desktop.
Youtube has a 10 minute limit on videos, this video is 36 minutes.
Its unsolvable to do it in advance, but quite possible to do it while observing the running code (a bit like how a filesystem optimizes the locations of data on disk).
I'm sure you would notice an apparently suspicious huge JVM process eating your CPU time. :]
How is that different from any other kind of JVM process?
I do respect Ian, but cant we do this with the existing language infrastructure and just extend it?
---- Booth was a patriot ----
Isn't that what the new vSphere or some up-and-coming release from VMware supposed to do?
-m
http://www.invisik.com
That's true to some degree. But computers do slow down as they age. Components damaged by the constant heating cause more errors and therefore require retransmission or error correction, slowing things down.
No, not really. PCs are nowhere near that sophisticated. A high-speed CPU bus is not like a DSL connection. Pretty much it has to work near-perfectly, or it's blue-screen city.
... still as fast as it ever was (faster, actually ... I have it running a stripped-down version of XP.) If you have a motherboard or PC that is getting errors due to heating what you're going to see are crashes and lockups, not slowdowns. Personal computers are not mainframes or minicomputers: even with ECC memory they are not fault tolerant to any significant degree, and frankly I think it's a wonder they work as well as they do (Windows issues aside.) When a component starts generating errors your average PC just breaks ... if you're lucky it's just the faulty subsystem, but if you're not the machine is toast.
For example, I have a couple of Athlon 1.4 ghz machines that are running just as fast as the day I built them, and they've never been turned off. Also have an old Thinkpad R41
People's machines slow down because a. they never defrag their hard drives and b. they get infected. It just takes a single badly written piece of malware to turn an otherwise decent machine into a 386, yet users frequently blame the hardware for being too old, as if that somehow explains poor performance. Many people are completely amazed when I clean up their system for them and pack the hard disk. "Wow, it's like a whole new computer!" No, dimbulb, it's the same computer you've always had, you were just too lazy to give it even minimum maintenance. I'm glad I'm not in IT: it's a lot like being a doctor. You have to deal with people who have no ability to think rationally about their problems, and even when you give them good advice they never follow it anyway.
The higher the technology, the sharper that two-edged sword.
I think he should fix the monstrosity that is Freenet before he jumps onto other things.
Arash Partow's Philosophy: Be a person who knows what they don't know, and not a person who doesn't know.
Erlang apparently gets it right. Scales smoothly from single core to multi-core to multi-server in a near linear fashion. Astonishingly reliable, having achieved nine nines of uptime - much less than a second of downtime - in a year. Purposely designed to mitigate shared memory problems. Built for hot-switchover - you can upgrade Erlang problems without closing them first!
In just about every conceivable way, Erlang is the right choice for high-end multi-core multi-system clustered application development. I have a large-stack, clustered application written in PHP. While it works well, there are limits to what we can do within a single process - a problem that's likely to become worse over time as needs continue to scale up. If I were to do it all over again, I'd take a good, hard, look at Erlang.
I have no problem with your religion until you decide it's reason to deprive others of the truth.
I read that as "distributed copulation" for some reason. I need more sleep.
"Ian Clark of the Freenet fame".. Actually, practically no claim about Freenet came true. The authors advertised "anonymity" etc. etc. at the same time as university professors published studies of statistics about the snooped connections: to any node present on the network for some time it is elementary to collect IPs.
It was painful to see so many users completely duped by the untrue claims, which their authors knew pretty well were untrue (and of which fact one-word admissions can be found buried somewhere in the wiki on their site).
Ian Clark and his collaborator knew nothing of the concept of the Small World (a type of graph that naturally grows in case of such networks), and therefore were not aware of the conditions (i.e. parameters that have to get set for the connecting nodes in their network) needed to make this network self-sustaining, and when pointed to the concept, they chose models by Newman, a prolific publisher of those arising from computer-simulated abstractions, rather than Barabashi (I'm afraid I misspelt his name), who offers much more realistic and practical ideas about this kind of topological network structure.
People do not change, really.
So what I'd expect from this announcement is a repetition of the story with Freenet: a real and interesting problem, inflated claims, and no actual solution, just claims and "development" for years to come.
So I remain a pessimist.
So long story short it is time that we get some proper software running on those computers.
For 99% of the people a computer is an appliance, like the TV and the stereo. They do not get more maintenance other than being dusted off once in a while.
Defragging harddrives: is that still necessary in the Windows world? I stopped doing this more than 15 years ago, at the time running OS/2 and its HPFS.
Getting infected: yes that's an issue and I have honestly no idea how to really prevent this. Even a fully locked down O/S will always allow infections to take place, as long as there is a human factor present.
Computers should be considered low-maintenance appliances by the designers, and hardware and software should be designed and written with that in mind.
Is there a potential to use this on a GPU? The current problem with GPU programming seems to be solved with swarm.
I used SWARM an year ago and I was impressed by the possibilities it offers. It was also pretty stable. I'm sure it would have achieved a very reliable level of stability by now.
The largest prime factor of my UID is 263267.
People's machines slow down because a. they never defrag their hard drives and b. they get infected.
You also need to take into account that they may install new service packs and other software to tread water, not to mention adding new bloatware. For example, probably hundreds of thousands of PC desktops and laptops were sold with Windows XP Home and 256MB of RAM, that were not slow at the time. But try running them today with Service Pack 3 and adding antivirus (like AVG) and firewall (like ZoneAlarm) programs, without adding any RAM, and they're terrible. Now consider that many of those same systems also shipped with Microsoft Office 200x, and what the latest service packs for that adds to the load.
Get off my launchpad!
way way back, IBM did some stuff with Java and agents... http://wapedia.mobi/en/Aglets
Learn the broken window fallacy, please.
I feel fantastic, and I'm still alive.
Its the fricken $NtServicePacks in C:\WINDOWS that slows the whole damn thing down. But don't delete them manually, rather run "Disk Cleanup" and let it get sync the corresponding registry entries, so that windows can revert to its last known good configuration ... 3.1 IMO
RTFM is not a radio station.
I've not RTFA'd yet, but on first blush, this sounds suspiciously like the Amoeba project/work/files. The net result of Amoeba was that you'd end up with a large virtual machine, comprised of many individual machines scattered across different sites. How is this different?