Mosix 1.0 Released

Re:Just keep in mind... by volsung · 2001-05-05 06:37 · Score: 3

A "green threads VM" is a Java Virtual Machine that does not use the native operating system threading mechanisms to implement the Java threads. Instead, the JVM acts like one single threaded process to the operating system and threads the Java app internally. Green threads VMs are generally slower (cuz the OS can manage task switch more efficiently), but since they are just a single process, they can migrate to a new box without any problems.

Re:Finally, something resembling clustering for Li by LunaticLeo · 2001-05-05 02:04 · Score: 3

Why do you feel compelled to post on a subject correcting other people's ignorance and yet being so profoundly ignorant yourself?

Repeat after me "THERE ARE MANY KINDS OF CLUSTERS". Again...again...again....

Now we will play a game: match the clustering technology description to a popular name. Match the letter to the number.

A. Message passing clusters used primarily for low bandwidth parralelel computation.
B. Load balanced single protocol network clustering.
C. Hardware takeover / hardware redudency for Hi-Availability clustering.
D. Load balanced, homogeneous platform, with process migration clustering.

1. Veritas Cluster Server with Sun Multipath IO devices.
2. Arrowpoint-type web load balancer.
4. Mosix.
3. Beowulf.

For extra credit:
Is the above listing of clustering technologies comprehensive? [Y]es [N]o

Answers available from those with a clue after class.

--
-- I am not a fanatic, I am a true believer.

Re:[OT] deluge of overrated posts by landley · 2001-05-05 04:53 · Score: 3

The moderation range is saturated. No doubt about that. The obvious solution is increasing the moderation range (possibly all the way to ten, for future growth), but first we've got to get Slash HQ to acknowledge that there is a problem.

There used to be a moderation category that was "just the best, most pithy synopses of the dicussion". Now that can easiy be 30 posts, and reading them doesn't fit in 3 minute "while this compiles" break anymore.

Part of it is that there's more posters these days, and more moderators, and the top 5% of 50 posts is a lot smaller than the top 5% of 500 posts.

Part of it is the automatic +1 of posters with a history of good karma. This is a good thing, but it reduces by 25% the range that can only be reached by active moderation. (The original moderation range of 2-5 has been reduced to 3-5. You used to be able to read at 2 and filter out the stuff that hadn't been voluntarily moderated up at least once. That's no longer the case, and even Einstein wasn't ALWAYS worth listening to. Sometimes he was just ordering breakfast, or complaining about the weather.)

Zero used to be a penalty for posting as an anonymous coward (since the troll ratio there was higher). 1 was standard. 2 being experienced poster who generally has somethng to say, that's meaningfull. This is a good heuristic for a starting position, but there's not enough room to go up fromt here, the system is swamped.

Slashdot has outgrown that range, even WITHOUT raising the floor. More marginal opinions less universally approved of (and less central to the topic) now reach the top category, because they have more opportunities to be moderated up. 5% of the viewership can easily spend 5 moderation points now.

perhaps we can go to a moderation percentage system? "Show me just the top 5% of posts"? Or sort them by popularity and give me the top fifteen...

It's an interesting problem.

Rob

Re:Finally, something resembling clustering for Li by landley · 2001-05-05 05:25 · Score: 3

>Repeat after me "THERE ARE MANY KINDS OF
>CLUSTERS". Again...again...again....

>Now we will play a game: match the clustering
>technology description to a popular name. Match
>the letter to the number.

Berries come in clusters. Stars come in clusters. Military rank insignia come in clusters...

Californians... No wait, this is a family oriented area.

Rob

(Austinite. They move here and can't drive, so we get to make fun of them.)

Re:Distributed system failure? by landley · 2001-05-05 14:39 · Score: 3

>SMP and NUMA are different problems because they
>have different failure characteristics.

It's a question of what problems you want to address. It's entirely possible to have multitasking multiuser operating systems without virtual memory. (Just about every 1970's era unix before the Vax, actually.)

Doesn't make the problem fundamentally different, just that there's more cases to cover. Do you always check for a non-null return from your mallocs, or do you just say "the system should just never run out of memory"?

>As far as I know, an SMP operating system
>assumes that, if CPU #2 was there just a moment
>ago, it will still be there.

Three words: Hot pluggable hardware.

And yes, they're talking about adding that capability to the Linux kernel in 2.5. (Although the current patch has a /proc entry to switch the appropriate processors of and on before just yanking them. Then again, PCMCIA proves you can do it without manual notification since you get several miliseconds of warning, which is ages to the computer...)

>What happens when your operating system needs to
>fault in a page, but your distributed VM manager
>lost network contact with your other server(s)?

Well, when piranha.rutgers.edu did this (no local hard drive, it swapped through the network to the server in the back room), its response was to die spectacularly (sunOS didn't blue screen, it white screened). This is not a new problem.

Then again, how many apps never check the return value of malloc and just expect the OS to go down if the system runs out of memory anyway?

If you were really swapping through the network (despite hard drives being cheap they ARE failure-prone moving parts), I'd say use distributed redundant swap devices and treat them like RAID 5 so you can loose one and recover the data? Also avoids network bottlenecks. But then you're eating network bandwidth needlessly, which is usually your limiting factor. (Then again, you page fault all sorts of other stuff through the network anyway in a shared memory config, it wouldn't so much be swapping as a larger distributed memory management system.)

It's an open question on the best way to go. Performance vs reliability is often a tradeoff. But there are PLENTY of different options.

>How can the operating system handle this error
>gracefully? Or politely warn the userspace
>application? :-(

How does RAID 5 do it today? (Let's see, SMART disks, battery packed up power supplies notifying of failure, hot pluggable hardware... It'll probably all get molded together someday into pseudo-coherent infrastructure of dynamic system status.)

The most graceful thing for the OS to do may just be to suspend the app and save off its state until it can continue. It depends. As I said, there are a lot of options.

Rob

Distributed system failure? by cpeterso · 2001-05-05 11:39 · Score: 3

SMP and NUMA are different problems because they have different failure characteristics. In distributed programming, you often must expect network failure to be a common occurence and handle those errors gracefully. As far as I know, an SMP operating system assumes that, if CPU #2 was there just a moment ago, it will still be there.

What happens when your operating system needs to fault in a page, but your distributed VM manager lost network contact with your other server(s)? How can the operating system handle this error gracefully? Or politely warn the userspace application? :-(

--
cpeterso

I just wish... by loony · 2001-05-05 00:42 · Score: 3

that it would go in the official linux kernel...

CONFIG_MOSIX=y

It's a great chance that Linux doesn't only play catch up with Windows or other flavors of Unix - it can take the leader ship and give you the ability to create clusters using the tools in the standard distribution!

Lingering bugs by De · 2001-05-05 00:56 · Score: 3

I've been trying to run this for the last few days, and I've gotten so many kernel oopses that I've had to revert back to a standard kernel. YMMV

Mosix a valuable technology by BierGuzzl · 2001-05-05 03:22 · Score: 3

Unlike other paralell processing environments, mosix is a solution that can in many cases be put to work with just a few changes in the init scripts and a kernel recompile -- no applications or libs need be changed, and things like web servers can take advantage of it right out of the box.

Re:Finally, something resembling clustering for Li by gavrie · 2001-05-05 04:03 · Score: 3

On the issue of true device sharing:

There is a possibility of using MOSIX together with GFS (which gives true device sharing) so that you don't need to use something like NFS. This way, a migrated process will be able to access the device directly, without needing to go through its home node.

AFAIK, this option is still not production-level, though.

Re:[OT] deluge of overrated posts by Anonymous Coward · 2001-05-05 00:57 · Score: 4

[OT] deluge of overrated posts (Score:5, Interesting)

the irony is thick.

Mosix is definitely cool... by Anonymous Coward · 2001-05-05 00:38 · Score: 4

I'm in an academic department that does a veritable sh*tload of computations, and we've been using it for nearly a year to load-balance a bunch of P2-350s. It's great, makes those machines feel "loved", and keeps people's research on progress. It's great when you can break up your problem/program into a bunch of smaller ones. But it's not perfect, and as with all clustering solutions, doesn't do the hard work (algorithm parallelizing) by itself. However, it is going to form the backbone of our dept system, replacing a pair of big-iron Sun servers...

Re:BS by MSG · 2001-05-05 04:19 · Score: 4

It should be noted that Linux takes longer to switch tasks on the PIII if it was compiled to support SSE2 instructions. Five dollars says that FreeBSD doesn't support them. Perhaps those benchmarks would have showed different results if the kernel had been built without SSE2 support?

http://linuxtoday.com/news_story.php3?ltsn=2001- 05 -03-007-20-NW-KN

[OT] deluge of overrated posts by AiX2 · 2001-05-05 00:49 · Score: 4

The past month on Slashdot, a disportionate number of posts have been marked +5 Interesting. In the past, +5 Interesting has been reserved for especially well written and clued in posts. Slashdot needs to change either the number of people receiving moderation points or increase the maximum a post can be rated to 10. If this were to take effect, I could simply read the 5-6 best written or insightful comments instead of the posts people feel they have to waste their mod points on.

(just my drunken rambling)

--Ryan

learn your history by janpod66 · 2001-05-05 01:18 · Score: 4

The Mosix research group has been working on clustering for many years:

So far MOSIX was developed 7 times, for different versions of UNIX and architectures. It has been used as a production system for many years. The first PC version was developed for BSD/OS. The latest version is for Linux on X86/Pentium/AMD platforms.

Yes, they did start out basing their system on proprietary kernels, then they moved to BSD, then to Linux. The current work is not about the basic idea anymore, moving processes around somehow, but about things like distributed virtual memory, distributed file systems, and migration strategies.

This isn't "playing catch-up", it is cutting edge research by the people who did the original work moving to the BSD and Linux platforms because they are more widely available, are better supported, are easier to license and share, and have more software available for them.

Re:Finally, something resembling clustering for Li by glenmark · 2001-05-05 02:59 · Score: 4

My comments were not rooted in ignorance, but rather an intimate familiarity with the technology developed by the DIGITAL engineers who INVENTED clustering, back in the days before the term was diluted by use in situations that have no relationship to the original application of the term.

Veritas Cluster Server is actually a clustering technology (only marginally, since, like Microsoft clustering, it employs a shared-nothing model). I have no problem with the application of the term here.
Arrowpoint? No. Load balancing is useful in a cluster (and even more useful when applied to all networking protocols, not just HTTP), but load balancing alone does not a cluster make.
Mosix (homogeneous process environment) is borderline. It just needs the addition of a few key technologies to qualify. (DLM, device sharing)
Beowulf (distributed parallel computing environment), while an interesting and useful technology, has nothing whatsoever to do with clustering! The term is misused here, and that is what annoys me. If you were to classify Beowulf as a clustering technology, you would also have to call every computer participating in SETI@home a single cluster, which it clearly is not...

And no, that list is far from complete. No mention of HP's clustering technology, Compaq's OpenVMS Clusters, True64 Unix TruClusters, or Tandem NonStop Fault-Tolerant clusters, or Microsoft Clustering (although, again, that is a VERY weak form of clustering, and lacking in several respects).

Essentially, this is an arguement over semantics, over the definition of the term "cluster". I merely oppose dilution of the meaning by applying it to lesser technologies which have no relationship with the original meaning of the term.

--
*** Quantum Mechanics: The Dreams of Which Stuff is Made ***

A quick primer on types of paralell systems. by landley · 2001-05-05 05:20 · Score: 5

There are traditionally three different types of paralell processing systems: SMP, NUMA, and networked clusters like Beowulf. In reality, these form a continuous range, with SMP at one end, Beowulf at the other, and NUMA in the middle.

SMP is Symmetrical Multi-Processing, or one computer with multiple processors just like multiple hard drives, multiple serial ports, or multiple banks of RAM. In an SMP setup, each processor has equal access to the other system resources, and although they may need locking to avoid stomping on each other's activities, it's no more expensive for processor #2 to access a certain resource (such as an area of main memory) than it is for processor #5 to do so. Thus there's no real reason to shuffle processes around to be "closer" to some other resource.

The other end of the spectrum is message passing networked clustering, like beowulf, where isolated systems (each with its associated set of resources) accept complete tasklets, work on them more or less alone, and output the results. Accessing resources from the rest of the cluster is very expensive, and you try not to do it more than absolutely necessary (once per transaction). A message comes in with all the info a node needs to do its work, and the node sends a message back out with the result and to announce it's ready for the next mouthful.

NUMA is in between, and it stands for Non-Uniform Memory Architecture. You have a bunch of similar processors, like in SMP, but some resources are "close" to each processor and some are far away.

Remember, clusters own resources outright, this is my node's memory. On SMP all processors access a pool of shared resources (like main memory) at the same speed (hence symmetrically). On NUMA, processor #53 -CAN- access memory over by processor #1736, but it'll take much longer than if it accesses memory near itself. It'll block, it'll have wait states. (Just like accessing a page swapped to the hard drive vs accessing one in memory.)

The thing is, as systems on either end become more complex they move towards NUMA. Think mondo SMP systems with dozens of processors, each of which has megabytes of L1 cache. You want to keep stuff "in cache" rather than accessing main memory, and sometimes you wan't to access something that's currently in some other processor's cache. Cache line pollution and such. That's a NUMA type of problem.

From the other end, once you start connecting beowulf clusters together with really high speed interconnects (like gigbit ethernet or myrinet, and often speed here is more a question of latency than bandwidth,) and start teaching them how to pretend to be one big shared memory image by page faulting through the network, you're approcaching NUMA from the other end. Stuff's in my machine's memory locally right now, and swapping it in from some other guy's memory (and swapping out some of my stuff to make room for it) is something I only want to do when absolutely necessary, because it slows me down.

MOSIX is taking beowulf clusters in the direction of NUMA. This is a good thing, it makes them more flexible and capable, but it opens up a whole can of worms to optimize it properly. (Not a new can of course, the kernel hackers are already dealing with a rather significant portion of NUMA's issues just trying to get 32 processor alphas to work smoothly.) If the interconnects between clusters were perfect, we could just treat it as one big SMP machine. Then again if our hard drives were as fast as our ram we wouldn't try so hard to minimize swapping, would we? You could still just treat MOSIX as SMP instead of NUMA if you don't want to optimize your performance. And for many things that's a fine solution, just distributing it cross the cluster gives you all the performance you need, and adding nodes is more cost effective than rewriting your app for greater speed in the new environment.

But performance hits of thrashing all your pages through the network can be just as bad as thrashing them in and out of the swap partition. And performance is the only reason we're using clusters in the first place, isn't it?

And NUMA optimization just makes maintaining locality of reference, streamlined locking, and minimizing contention for commonly accessed resources even MORE important. It's the same kind of thing you'd do on a normal SMP machine anyway, it just has more of an impact, because there's more inefficiency to optimize away.

Rob

17 of 77 comments (clear)