OpenMosix
Francesco Taurino writes "Moshe Bar has released a new Mosix system: openMosix.
From the site:
"For thousands of users Mosix has been a reliable, fast and cost efficient clustering platform. There are hundreds of Mosix installations in life sciences, finance, industry, high tech, research and government environments. The goal of openMosix is to give to these users a continued support and an up-to-date platform. openMosix is initially fully compatible with the last Mosix (1.5.2 for 2.4.13) kernel, but is now growing in its own direction.
If you would like to contribute to the openMosix project, drop a line to moshe@openmosix.org.""
OpenVMS has the most robust, and fastest clustering ability I have ever seen . I just cant wait until linux can do this.
This page left intentionally blank.
From the Mosix home page:
Only Prof. Amnon Barak is authorized to represent MOSIX.
...???
What's the story here, Moshe?
Napster-to-go says "Fill and refill your compatible MP3 player", which is a lie. It's not MP3. It's WMA with DRM.
Imagine a beowolf cluster of these!!
Oh yeah, this is mosix, bummer.
"And we have seen and do testify that the Father sent the Son to be the Savior of the World"
1 John 4:14
I just finished implementing my 6 node MOSIX cluster, and I ran across several bugs, and I couldn't find any place to report them. The MOSIX development list is closed subscription, and apparently the good Professor ignores his email.
I'm not clear about some things though... How is MOSIX currently licensed? Why are they being so closed about development?
I've had enough abrasive sigs. Kittens are cute and fuzzy.
MOSIX 1.5.7 for Linux 2.4.17 (K-MOSIX) is out, according to Freshmeat. Therefore "the last Mosix (1.5.2 for 2.4.13) kernel" seems incorrect.
and since information is a bit lacking at the link provided, here's a link to the regular mosix FAQ.
To-do List: Receive telemarketing call during a tornado warning. Check.
.. that the Mosix site is running on a Mosix cluster to withstand the slashdot affect
"Never let the truth get in the way of a good story..."
I guess I wanna know why there was a fork. I respect both the big Prof and Moshe from what I have read of theirs. Moshe says that Mosix is going in other directions, which sounds kinda...vague ;)
-- Who is the bigger fool? The fool or the fool who follows him? --
what happens if a node in a mosix cluster dies? i've hunted through the docs to no avail.
US Citizen living abroad? Register to vote!
Well, I wanted to read the FAQ to find out what Mosix was, exactly, but apparently you have to be an admin to get to the FAQ. That sucks.
I don't have a link handy, as it's been a few months. I found it to be fairly simple to install... I had 4 PIII machines, set them all up on an internal network and nfs mounted a directory from the head. From there, it was a simple series of steps:
/proc/self/lock, I think it was)
Unpack kernel sources.
Run the Mosix install script.
Did that on each node, then started the mosix service on each.
It worked like a charm for large computations, but had three flaws for normal use.
1) By default, it does not auto-migrate, which was pretty dumb. And getting it to auto-migrate was buried deep in the docs, though it could be guessed from reading up on locks. (echo 1 >
2) Migration only occurrs after a certain load average is maintained... if your job involves spawning multiple short-lived processes, like a large compile, it doesn't migrate anyway.
3) Network usage for migration was very heavy over Fast Ethernet.
There you have it. It's the last reason that MOSIX isn't used often in commercial clusters, but it seems well-suited for other distributed computing applications, and has some interesting features, especially for NOW configurations.
- Free tabletop fantasy gaming! Grey Lotus
http://www.ai.mit.edu/lab/sysadmin/cluster.html
There are several limitations to what MOSIX can
currently do. Java native-threads is one thing,
since MOSIX cannot migrate apps using shared
memory all native-threads applications will stay
on the node they started on, green-thread
based application can migrate though the
internal threads aren't exposed to the OS so no
real parallelism is achieved. MOSIX also can't
migrate sockets so I/O bound problems also
stay at home. Mixed I/O and CPU jobs can
migrate for CPU cycles, but are brought back for
the I/O ops. In the limited testing to date,
processes that can't take advantage of MOSIX
don't seem to be hindered at all by it
I know it would break user land tools that were written with bad assumptions - that's why they should fixed. top, for example, would require all of a 50 line patch.
/proc entry) to prevent this stupidity.
It was a very bad decision to ever make a private kernel constant public (like HZ). There should have been a system function for it (or a
There was a thread on the lkml in the summer of 2001 where they discussed getting rid of HZ altogether and keeping the outward appearance of a phony HZ for just such broken userland programs. Why not take that approach?
HZ = 100 in this age of multi-GHz x86 machines is insane.
Can I mix Mosix and openMosix nodes in the same cluster?
No. Just like the older Mosix, you should not mix nodes because the protocols are subject to un-announced changes from version to version. On top of that, every new version has bug fixes which warrant updating to the new kernels.
Whois the copyright holder of openMosix?
All the old Mosix code is copyright by Prof. Amnon Barak of Hebrew University of Jerusalem. All new code of openMosix is copyright 2002 by Moshe Bar, Tel Aviv.
How do I upgrade to openMosix?
openMosix maintains for now compatibility with the user-land tools of Mosix 1.5.2. I also have a port to openMosix of the user-land tools which will be released soon. To upgrade to openMosix, simply download the openMosix patch from www.openmosix.org and apply the patch with
patch -Np1 < openMosix1.5.2moshe
to a stock Linux kernel of 2.4.14 or 2.4.16 respectively. Make sure to get your old .config file (the .config file remains compatible) and recompile your kernel and modules. Then, reboot.
Is openMosix a fork of Mosix?
Right now, it is a pure fork. Eventually, it will become a separate clustering platform with its own distinct feature set and behaviour.
I really appreciate the work that Barak has done with and on Mosix. But I also find it kind of odd that Mosix could be the "property" of one individual. I would assume that it was developed with public research grants and while the author was employed at a university. Graduate students probably have also contributed, and there probably have been bug fixes as well. So, maybe it isn't bad if there is a GPL'ed distribution of Mosix after all. The GPL regulates issues of ownership rather well.
As for a user-space implementation of Mosix, I think that makes sense, although it has its drawbacks as well. One of the problems with user-space implementations is that they tend to be less than transparent in practice. It also strikes me as somewhat redundant, since Condor has already gone the user space route. A userland Mosix would only make sense if it were free and open source (as opposed to Condor).
Altogether, I hope people won't get too upset at each other over this. Mosix is great stuff and Barak and his university have been generous in making it available freely up to this point.
...and recommend it!
MOSIX is great for general-purpose clustering. Processes are scattered across a cluster automatically without having to modify the programs. No API is needed other than usual Unix-level process use and it allows parallel execution of any program, although full use requires a parallel program design.
I just wish that it would go in the official linux kernel, something like
CONFIG_MOSIX=y
It's a great chance that Linux doesn't only play catch up with other flavors of Unix - it can take the leadership and give us the ability to create clusters using the tools in the standard distribution!
I had been using mosix for quite a while until recently. The 2.4.13+ext3+mosix release didn't even work properly with all of my hardware, oddly my networking refused to work at all; other kernels had no problems, a kernel built with the same .config worked fine on another box.
:) I do of course, assume that source IP is the biggest stumbling block in migrating network-bound applications.
Anyway, I had noticed that 2.4.13 was sticking around awhile.. meanwhile, the page was being updated.. adding some information about a user-space version and the 'disclaimer' on the download page.
Mosix sounds like a good thing, but in reality it isn't very suitable for many of my common tasks. The biggest problem is the lack of support for programs using shared memory. Apache, Mysql, these do not migrate. Also, programs using Xwindows will not migrate well.. as they are network bound and will migrate back to their home node once they need to report the the Xserver. Basically, don't install mosix on 10 of your home machines and expect any kind of performance increase.. besides, you could probably toss out your dual 1ghz PIII, replace it with a 400mhz celeron and not notice a difference.
It would be nice if there was a transparent pseudo layer for things like X.. where all of the machines have their IP for communicating via mosix.. but all 'outside' communication would be made via mosix though a 'public' ip. That public IP would then be used to connect to the Xserver; hence, apps network bound could migrate easily as they would still have the same IP and (spoofed) Mac address. Basically it would be building a NAT router into Mosix for the idea of being able to migrate network-bound applications. It sounds more complicated then it is, but less complicated then it is to impliment.. and I probably don't make any sense, but I know what I mean
You miss the whole point of MOSIX. At our Center, we use it to manage the execution of MANY jobs across the cluster. If it is not beneficial to do so, the job won't migrate. This is an entirely different type of cluster than MPI or PVM with a different goal in mind (and consequently, different pros/cons). In our case, MOSIX is a better solution because it doesn't require domain decomposition.
665: The mark on the forehead of Satan's slightly less evil brother, Stan.
For you Mandrake users, I head a project to include LTSP and Mosix on a Mandrake configured kernel; to package and explain in very easy terms the whole process, and then eventually release a stripped-down Mdk, geared towards education (edu-tech is pretty much my field) ala K12 LTSP. We call it The Mandrake Mosix Terminal Server Project. Check it out and lend a hand if interested. Thanks.
put the what in the where?
Insert here.
Breakfast served all day!
From my brief experimentation with Mosix and a bit of reading, this sounds correct.
Basically, mosix is a very "chunky" sort of clustering - it works on the level of "whole processes". Because of this, you don't need to write your software to do the splitting and migrating yourself as you do with "less chunky" pvm and mpi. On the other hand, a process split off from a pvm program running can be handled by mosix like any other process and migrated to the cpu that mosix thinks can get the process finished fastest.
Mosix seems like an ideal way to 'lend' processing power to slower machines. This is what I was doing when I played with it previously - I had a K6/2-350 and a P-100 laptop with no L2 cache. I got Mosix set up on them both and used a command-line mp3 encoder as a benchmark. On the P-100, encoding speed was about 15% of realtime. On the K6/2, it was about 110%. Running Mosix between the two over 10Mb Ethernet, I could encode mp3 at about 85% - I suspect it'd have been significantly closer to 100% if I'd had 100Mb Ethernet at the time...
Hopefully OpenMosix will keep up with current kernel versions better. Better still, maybe they'll be able to get it merged into 2.5 at some point...
Hacker Public Radio is our Friend
We run Mosix, and have had jobs fail. If a node goes down that is running a process two things can (and have) happened. The process dies, or it restarts on another node. What determins this? I have no idea. I can only speak from observation.
It is suggested in the documentation that you have a large swap space on your disk to handle nodes going down. Perhaps with a cached copy of a process it will live on.
In any case, its not scientific but I thought I'd throw that in.
I found out the news yesterday, and I've already applied your patch + XFS. We've ran this on 1.5.2 and we haven't had too many problems with this setup.
Coding wise the conflicts seemed trivial (and many times redundant). To minimize potential conflicts we don't use MFS and we don't use the debugger.
The only difficulty came when we started using Mosix 1.5.2, we had some issues where we get intermittent periods of "Too many open files" when a node goes down. Somehow we've avoided them for the past month, we think this may have more to do with AutoFS.
I'm wondering if you would like the diff from this? I'm also interested in helping with the DSM development and socket migration. I may be slow on the uptake but where can I start and help out?
1) By default, it does not auto-migrate
;)
Hmmm, maybe that is why it all of a sudden started working when I re-installed it. Anyway, I highly recomend MosixView [www.mosixview.com] for Mosix Administration. It is a effective but simple way to monitor and adjust your cluster.
2) Migration only occurrs after a certain load average is maintained
I believe that is what Prof Amnon is using for developing U-Mosix. From the home www.mosix.org page...
"U-MOSIX provides even load distribution using several of the algorithms of K-MOSIX. U-MOSIX is better tuned for cluster and GRID computing, including the ability to handle large number of short processes, run in heterogeneous clusters, with different versions of Unix such as FreeBSD, Linux and Solaris."
For those of us that don't want to wait for U-Mosix for grid-computing (also known as cluster queueing) I suggest Sun's Open Sourced Grid Ware Engine. It comes complete with a Beowulf Cluster built in.
3) Network usage for migration was very heavy over Fast Ethernet.
Actually, we haven't noticed much of a load at all.
Btw, we are a commercial cluster
Moshe Bar has released a new Mosix system: openMosix.
The word "bar" is Hebrew for "free"... Free as in speech, not beer, believe it or not!
Duh.