Slashdot Mirror


Distributed Operating Systems?

ayejay asks: "Are there any models/designs for a totally distributed operating system, possibly utilizing AI to learn patterns of use, resource need, and anything else that might be relevant? What -would- be relevant to such a thing? Given Napster and all the load balancing kernel enhancements and SETI@home type programs out there, it seems the idea is ready to be developed into a feasible paradigm. What do you think some of the major concerns/design issues are? I'm talking about nuts and bolts..." Now I'm all for distributed applications, but applying such paradigms to something as critical as the operating system seems to be taking the issue a bit too far. Would creating a 'distributed' operating system gain us any advantage over what we are currently familiar with?

54 of 204 comments (clear)

  1. There are lots.... by Anonymous Coward · · Score: 2

    Sprite, Plan 9, Inferno, Springs -- just to name a few -- all have various aspects of being "distributed operating systems" to them....

  2. Suns... Plan 9 by Dungeon+Dweller · · Score: 4

    Sun Microsystems products are designed around a network paradigm. A lot of the distributed stuff we have today comes out of their work. Distributed being used in a bit more ubiquitous sense than necessarily meaning clustering the processor power.

    Plan 9, as part of its design, is designed with distribution in mind. Check it out!

    --
    Eh...
  3. Mosix by 1010011010 · · Score: 5
    Mosix is pretty cool, and will be even nicer when they have Distributed Shared Memory, Migratable Sockets and Direct Filesystem Access issues worked out (currently Mosix does i/o remotely through the home node, which makes it slower and loads the home node; DFSA allows remote nodes to access files locally rathen than via remote-I/O).

    It provides preemptive process migration among cluster members. If you log into your "home node" and start a process, it will get migrated around the cluster according to its memory and CPU needs. Take a look at their remote monitor.

    Currently it's Intel-only, but a mixed-architecture version would be sweet. Imagine a cluster of intel, alpha, PPC and sparc CPUs such that you log into any of them, run any Linux binary, and the loader cranks it up on the appropriate machien for you, transparently...

    From the website:
    MOSIX is a software package that enhance the Linux kernel with cluster computing capabilities. The enhanced kernel allows any size cluster of X86/Pentium based workstations and servers to work cooperatively as if part of a single system.

    To run in a MOSIX cluster, there is no need to modify applications or to link with any library, or even to assign processes to different nodes. MOSIX does it automatically and transparently, like an execution in an SMP - just "fork and forget". For example, you can create many processes in your (login) node and let MOSIX assign these processes to other nodes. If you type "ps", then you will see all your processes, as if they run in your node.

    The core of MOSIX are adaptive resource management algorithms that monitor and respond (on-line) to uneven work distribution among the nodes in order to improve the overall performance of all the processes. These algorithms use preemptive process migration to assign and reassign the processes among the nodes, to continuously take advantage of the best available resources. The MOSIX algorithms are geared for maximal performance, overhead-free scalability and ease-of-use.

    Because MOSIX is implemented in the Linux kernel, its operations are completely transparent to the applications. It can be used to define different cluster types, even a cluster with different machine or LAN speeds, like our 100 processors cluster:


    ---- ----
    --
    Napster-to-go says "Fill and refill your compatible MP3 player", which is a lie. It's not MP3. It's WMA with DRM.
    1. Re:Mosix by Amoeba+Protozoa · · Score: 3

      Mosix does rule, but I think it is based on the fork() and forget principle. I think it would be even cooler to have something that, given enough bandwidth, would transparently divide up processor time for a single thread/task. Why? Because I want to see ridculous speed for applications I cannot, myself, easily parallelize such as seti@home or commercial codecs.

      -AP

    2. Re:Mosix by S_hane · · Score: 2

      There's also a new OS in 'the pipeline', being developed by the University of New South Wales, Sydney, Australia.

      The OS is known as MUNGI, and is a single address-space operating system with persistent memory. This means that:

      (a) There's no such thing as 'devices' everything is mapped into the one 64 bit address space (including memory on different machines)

      (b) If you want to 'save' something, you stick it in memory and tag it as 'persistent' - hence, there's no such thing as files.

      If you want to read more about MUNGI, check out

      http://www.cse.unsw.edu.au/~disy/Mungi/index.htm l

      and particularly

      http://www.cse.unsw.edu.au/~disy/Mungi/manifesto .html

      -Shane Stephens

  4. AI by cybercuzco · · Score: 2
    I think this would be an interesting experiment in AI, after all the neurons in your brain are all linked up together. The question is what if we create true AI, artificial life, and what if it doesnt like us? I know I know, it sounds like a plot to a lame Sci Fi movie, but still, if the internet wer alive and inteligent, what would the implications be for human society? I would think that this intelligence would be smarter than any given human if only due to the massive amounts of information available to it, but even smart it would still be inexperienced, and babies tend to drool. I wouldnt want the internet to drool on me ;-)

    --

    1. Re:AI by cybercuzco · · Score: 2
      Youre assuming that humans will be doing the program, and to that extent, youre right. But an AI would ultimately be too complex for humans to program, it would have to be evolved or programmed by computers, and this leads to unpredictability. Second, even if this were not the case and it were programmed by people, you would need to include the ability ifor it to grow and develop itself, i.e self programming. Were you born with intimate knowledge of C? no, you self programmed yourself, any self respecting AI would have this ability, and would ultimately learn to make decisions for itself, another requisite trait for AI. It is possible that it could learn and grow to dislike or to like people, and those that say bad things cant happen should remember the titanic.

      Unless, as an open source or independent project, someone creates one of these entities, and does not engineer such safeguards. What happens from there, depends entirely on what the creator engineered into it. If the creator decided to engineer in a "survival instict", or a hatred towards humanity, or even a random element so that the entity would "decide for itself", the danger exists that it would fight for it's rights, and it's survival

      Doesnt this negate your assertion that it couldnt happen because people(tm) will program it, and apparently humans can do no wrong?

      --

    2. Re:AI by cybercuzco · · Score: 2
      Who moderated this as flamebait? this is a genuine concernt with any AI. Just because I said something controversial doesnt make it an invalid topic of discussion. its not like my entire post was "MAcz rule pee cee's Drule!" ore some such drivel, learn to moderate.

      --

    3. Re:AI by Sydney+Weidman · · Score: 2

      I agree with you. Moderation is good for a laugh and that's about all. Browse at -1 and don't fire til you see the whites of their eyes.

  5. Of course there would be an advantage by joshamania · · Score: 3

    Having a distributed OS would take a great load off of distributed application developers. Currently, a distributed application has to be able to handle all the tasks that a normal operating system currently does. Not having a distributed operating system for distributed apps is like not having an OS for normal client apps.

    Seti@Home has to be able to route all its necessary functions and information around its network. Why is that necessary? A distributed operating system should be able to handle the tasks of distribution for the applications. It's almost as if every distributed app developer has to re-invent the wheel every time he/she wants to create such an app. Why do you think there aren't many distributed apps out there? They're too bloody hard to code. Joe Schmoe VB developer cannot create distributed apps because like as not, he knows very little about networking. Most developers know squat about networking (keep in mind that most developers don't read /., so I'm not referring to YOU).

    Soon, every appliance in your abode is going to have a processor in it. That processor may be much more powerful than what is really necessary to operate the appliance. Especially if a web browser is built into your fridge. The processor has to be able to run the browser, so lets say it's Pentium class. Do you really need a Pentium to measure the temperature of the fridge and turn on the compressor? No. So every time the browser is not being used, clock cycles are wasted.

    I see no reason why future homes don't have the standard PC. They could use the collective power of all the processors in all of the appliances in the home to make a PC-type of interface for a user. It would also lend a certain amount of fault tolerance. Many functions would be duplicated on the home network, and data loss and downtime would be minimal if at all.

    1. Re:Of course there would be an advantage by Salamander · · Score: 2

      >Having a distributed OS would take a great load off of distributed application developers.

      ...and dump it on the OS developers, who already have plenty to worry about thankyouverymuch.

      --
      Slashdot - News for Herds. Stuff that Splatters.
    2. Re:Of course there would be an advantage by joshamania · · Score: 2

      I've read the Beowulf FAQ and have seen that exact question you've referenced in the above post. Have you seen the General Electric (or whatever big company...) commercials with the refridgerators that have the barcode scanners and the web browser so that Nancy Good American can scan in her empty sour cream container when she runs out? The refrigerator then shows the order on her web browser built into the door of the refrigerator and she confirms the order which is then sent out the ethernet (i'm guessing/embellishing here now) port in the back of the refrigerator, into her home tcpip network, then routed to the Internet and finally to Peapod/WebVan/Homegrocer.com to fulfill the order? I hardly think that a 386, 486, or even a PII is going to be able to handle a task like that with any sort of respectable speed. Especially when the hard drive in the fridge that stores order historys and customer preferences needs to run as well. I don't think embedded is going to be the way to go with these things. I really think that Transmeta (if they last that long) will be able to capture a large part of a market like this.

      Author of the comment concerning the Beowulf FAQ, please disregard this rant as you have the only enlightened reply to my original post.

      RANT ON:

      As to all of you bitching that SMP already takes care of a distributed architecture:

      Does SMP handle the latencies encountered when routing messages through ethernet cards? No.

      Does SMP handle the reordering of packets when they come back at way different (I'm talking several seconds, not microseconds here) times and in different orders? No.

      How can you compare a 100Mhz bus to a distributed architecture? You cant. They are completely different animals with different needs. 100 Mhz buses have caches and low latencies. Distributed architectures work on scales that are completely different than the inside of a microcomputer. Beowulf is perhaps the closest thing we have to a valid distributed architecture (for linux at least) and as far as I know it is not set up to work through routers/firewalls/shared media hubs/etc.

      Do any of you app developers have any concept of what a good sysadmin/hardware engineer has to deal with on a daily basis? It certainly doesn't seem so.

      RANT OFF:

      Please moderate this to hell to your hearts content. The intended victims of my rant will still see it in the thread replies...

    3. Re:Of course there would be an advantage by joshamania · · Score: 2

      Ho...oh...ha...heh...I didn't read this straight the first time through. That's funny...

  6. Amoeba by Malc · · Score: 2

    Whatever happened to Andrew Tanembaum's Amoeba? Didn't this have a concept of a transparent processor farm?

  7. Unclear by MostlyHarmless · · Score: 2

    The question is unclear.

    If you just want better clustering, shared drives, that sort of stuff, check out Mosix or LinuxNOW, as many other people have already pointed out.

    If you want the kernel or other fundamental, low-level parts of the operating system to be distributed, then you have a fundamentally bad idea. If you want the kernel to be distributed, you don't have a clue what you're talking about -- The kernel is designed to be low-level and small. It can't be distributed because it is inherently specific to the machine. It is also small enough that the performance loss in distributing it would be bad for time-critical kernel-space functions. If you want system commands like the shell and things in /bin to be distributed, those too are small and speed-critical. If you just want clustering for larger, less-frequent jobs, then you are back to the above solutions: LinuxNOW or Mosix.
    --

    --
    Friends don't let friends misuse the subjunctive.
  8. Excuse me, distributed? by Cramer · · Score: 2

    Well except that all the "SETI@home type programs out there" are NOT DISTRIBUTED COMPUTING. Those sorts of things are called "CLIENT/SERVER COMPUTING"... SETI clients talk to SETI servers, not each other. All of the nodes within the network form a tree, not a web.

    Exactly what do you mean by "distributed"? What about the OS will make it "distributed"? I don't understand what you're asking... any multi-CPU system is already "distributed" -- even more so in cases where the CPUs are in different geographic locations (i.e. a trans-puter, or "cluster".) [And, Solaris has had this ability in it's "HA" versions for several years. I've seen it in use linking two E4500's 12 miles apart.]

    1. Re:Excuse me, distributed? by Cramer · · Score: 2

      What? Put down the crackpipe...

      You're intermixing hardware and application terms. The thing you go download from a web page and run is the SETI@home client application. It downloads work and reports results to a SETI@home server application. For the purposes of discussion, both applications could be running on the same hardware "server". The SETI@home application (client) running on your machine doesn't talk to the SETI@home application (client) on any other machines; it can only communicate with a SETI@home server application. This is the definition of Client/Server Computing.

      In contrast, look at Gnutella. The application serves both as an information/processing client and server (i.e. a node). Your Gnutella node connects to N other Gnutella nodes who in turn connect to N other Gnutella nodes, and so forth, forming a complex web. You can remove any number of nodes and the web will heal in short order. USENET is built in much the same fashion (albeit much slower and less interconnected.)

  9. GnuSpace by cdgod · · Score: 2


    Before we go rewiring the whole frikin OSs. Let's try it in applications first!

    http://sourceforge.net/project/?group_id=7829

    From the Link:
    "GnuSpace" is an advanced Gnutella client that let users share both files and computation time. Unlike Gnutella, GnuSpace combines thousands of PCs unused CPU power into one coherent power-source to fuel super services to benefit all.

    --
    This .Sig is left intentionally humourless.
  10. Programming for distributed systems. by Christopher+Thomas · · Score: 3

    Having a distributed OS would take a great load off of distributed application developers. Currently, a distributed application has to be able to handle all the tasks that a normal operating system currently does. Not having a distributed operating system for distributed apps is like not having an OS for normal client apps.

    Seti@Home has to be able to route all its necessary functions and information around its network. Why is that necessary? A distributed operating system should be able to handle the tasks of distribution for the applications. It's almost as if every distributed app developer has to re-invent the wheel every time he/she wants to create such an app.


    You are already running a distributed application whenever you run a threaded application on a SMP box. Writing applications for a distributed operating system is no easier and no harder than this.

    You _will_ have some programming overhead no matter what - by nature, a distributed application needs to have multiple pieces running concurrently, and so has to manage synchronization and communication between these parts.

    The good news is that everyone already understands multiple processes and threads, so we already have a well-established programming model for it.

    Now, in the real world, client/server computing will always tend to have an advantage for wide deployment, as you can run those on heterogenous platforms (a la SETI-at-home). For small deployment... you're looking at either a high-processor-count SMP machine or a cluster, depending on the degree of coupling, and those are already well-understood.

    So, I'm a bit puzzled as to what you think needs to be developed. It looks like we have distributed computing already.

  11. OS info, including distributed ones by JohnZed · · Score: 4

    There's a huge list of various operating system projects here: http://www.cs.arizona.edu/peo ple/bridges/os/full.html.
    I find all the "pure" distributed OS stuff (systems build from the ground up to do distributed processing and not much else)relatively uninteresting on its own, but a lot of good ideas from those projects can filter into general purpose operating systems, especially when you start talking about clustering or even NUMA. You might want to see MOSIX for a cool, distributed/clusterd Linux version.
    --JRZ

  12. Several Options... by Christopher+B.+Brown · · Score: 5
    • Mach was the "granddaddy" of distributed OS work, with most of the recent efforts going into GNU Hurd.
    • There's Mosix that builds a NOW atop Linux
    • The MIT Parallel and Distributed OS Group should be mentioned; efforts include the Exokernel
    • Plan 9 has an interesting model for splitting work across "compute servers" and "file servers" and "display servers."
    • Distributed Operating Systems lists lots of them...
    • Sun's Spring was the basis for much of what is in CORBA;
    • Sprite provided a Unix-like distributed OS that provided much of what is being used now to build journalling filesystems
    • Amoeba was Tanembaum's successor to Minix; note that Python was one of the side-effects of the Amoeba project...

    Each has some somewhat different insights to bring to the table; there is no unambiguous way of saying "this is all vastly superior."

    --
    If you're not part of the solution, you're part of the precipitate.
    1. Re:Several Options... by Greg+Lindahl · · Score: 2


      Mach is the granddady of distributed OS work? Heck, Mach wasn't even the first distributed OS developed at CMU. Hydra pre-dates it by more than a decade. Bill Wulf did quite a bit of work on it. The successor to Hydra is Legion, at the University of Virginia.

  13. Success Depends on Application by tarsi210 · · Score: 4

    From the What-do-you-mean-the-coffee-maker-stopped-respondi ng? dept.

    The true success of a distributed OS will be in the applications in which it is applied. Obviously, if you don't have need for the advantages that a distOS brings to your computing, then you don't need a distOS, however cool it might be. My mother (who finally checks her email every night, bless her technologically-crippled heart) does not need the problems associated with attempting a distOS. What she does would not benefit from the extra resources.

    Of course, supporters of this idea (and I'm not saying I'm not one) would state that you don't think you need the distOS because we haven't actually made a reason yet to need it. Kind of like how everyone didn't NEED the Internet until, of course, we had it. Now there are sites like /. full of caffeine-enhanced techno-addicts. The presence created the need.

    This is true, I think, in many ways. However, I think when implementing such an OS consideration needs to be had for exactly what is being accomplished by it being distributed. I can see mainframe-like systems being extremely benefitted by such a system. A game system could really benefit from the extra horsepower, given that the connections were strong enough. Playing music, DVDs, etc...all very high CPU and memory applications could see some interesting benefits.

    How about stability and redundancy? How would you like an OS that ran even if a bomb knocked out part of its system? Rewrote and/or re-routed itself to account for the damage and still get the job done? Wow! What a disaster-safe way to compute! Of course, you have one of these OSes inside your head right now......

    End fact is: Good idea, needs lots of consideration into the practical application of such a thing so that we aren't playing solitaire with a distOS.

  14. Distributed OS by rudog · · Score: 2

    If I recall correctly, Lucent (Bell Labs?) had a completely distrubted OS called Inferno. It used spare processor cycles and memory/drive resources from all hosts attached to the network; essentially turning the network into a server, it could account for shifts in usage and even rebuild data from hosts that were removed from the network (like RAID) I don't know what the current status of the project is now (that was about 2 years ago)

  15. Re:How I kinda envision it by voop · · Score: 2

    I guess an important thing is to emphasize what it is that should be "distributed". Allready, most operating systems function "distributed", i.e. have the ability to access remote file systems, remote printers, support execution of a given process on a given remote "machine" etc. This is one form of "distributed operating system", which has proven to be well functioning in many settings. This is imho basically "distributed ressource sharing". A little more advanced is the case of one process, executing on one CPU in on one "machine" utilizing memory in another "machine". Still, this is not far out (see f.eks. Berkeley NOW or some such project). This is basically a matter of providing some services and presenting them in a way such that they appear as if they were "local" services.

    Another, different "wish" is the ability to "execute any one process on 4½ processor". Which basically amounts to two issues, namely writing applications such that they can take advantage of an arbitrary number of underlaying processors (it's not gonna do much good to take a strictly sequential program and execute on any "multiprocessor-like" platform). The other issue involves automatic parallelization of programs by the operating system - something which is not a trivial matter, and often hardly worth it in "real applications". This basically amounts to providing a set of "handles", usefull for the programmer when writing a process and used by the operating system when scheduling and executing the process. Such exists allready both in academia (The Actor Foundry or Emerald are examples hereof) or in "real life" with MPI et.al.

    But the "dream" of having an operating system which is just "undefined distributed" and which is able to execute "just any" process distributed is not realistic - for many reasons, including those above....Unfortunately it is also a common "wish" to see caught out of the blue...

    --
    -- "Life is a bitch - and she hates me..."
  16. How distributed? by uradu · · Score: 2

    I guess the question is what exactly you mean by distributed. At one extreme you could consider DOS a distributed OS if it is set up to use shared drives on another machine. At the other extreme, you could try to distribute even intimate bits of the OS, such as the MM, the dispatcher, etc. The question is what you're trying to achieve: increased performance, or just being able to do it? If it's performance, you have to look into maximizing the bandwidth of the OS entities that communicate the most, and whether it would even make sense to put them at the other end of a network connection. If you just want to do it because you can (e.g. X-Windows), anything goes. All you really need on any machine is the particular entity you're trying to distribute, some network communications capabilities along with a marshalling mechanism, and some glue to make all the distributed entities make sense of it all. Of course, this "glue" is going to be what keeps you up at night when designing this thing.

    For a lot of applications, many of today's OSs can be considered distributed. Both CORBA and DCOM (or is it DNA nowadays?) provide mechanisms to abstract the location of a particular service, which in the end is what "distributed" really is all about, right? A lot of enterprise apps nowadays are quite highly distributed and often use OS capabilities to achieve that (certainly in the case of Windows).

    In the end, the question is how highly you want to distribute the OS, and what the benefits and tradeoffs are. If you want to achieve smaller unit sizes, eventually the unit might be not powerful enough to do much useful work--like a Beowulf cluster of 386 machines. If you just want to make it fault tolerant, it might be worth it anyway. And so on...

    Uwe Wolfgang Radu

  17. Check out medusa by Mr.+Sketch · · Score: 2

    If you're interested in learning about Distributed Operating System concepts, you could also check out medusa.

  18. Some Reasons for a Distributed OS by hopping+yak · · Score: 3
    1) Fault Tolerance: programs can re-continue execution even though some of the processors and memory that they reside on cease to function.

    2) Performance Benifits from Parallelism: distribute threads of execution across the global computational grid.

    3) Share Resources Efficiently: don't waste those idel CPU cycles. Don't waste that extra main memory. This may be the least valid reason, as cpu cycles and memory have a big head start over bandwidth on the value vs. time scale. Moore's law has all of them getting exponentially cheaper over time, but right now bandwidth is the most valuable of the three.

    4) Support a New Generation of Applications: Distributed operating systems can offer unique support for things like shared virtual environments, or widly distributed databases. It is a classic point of contention whether the distributed system services should be implemented on the application layer, or on some lower layer. However, I don't think anyone can argue that in terms of ease of application development, it is often very nice to have a really nice abstraction available on which to base your app.

    "A distributed system is one in which the failure of a computer you didn't even know existed can render your own computer unusable." -- Leslie Lamport

  19. Recalling Wrong by Christopher+B.+Brown · · Score: 2
    NeXTStep was based on an early version of Mach, back before Mach was a "seriously distributed system." It parallels Digital's OSF/1 aka "Digital Unix," which also was implemented atop "early Mach."

    Both NeXTStep and Digital Unix were monolithic OSes, despite the association with Mach.

    What you may be thinking of is that NeXTStep included a "distributed objects" scheme, lately being "cloned" as GDO (GNU Distributed Objects).

    --
    If you're not part of the solution, you're part of the precipitate.
  20. What *kind* of OS design? by John+Whitley · · Score: 2
    Are there any models/designs for a totally distributed operating system, possibly utilizing AI to learn patterns of use, resource need, and anything else that might be relevant?
    First, you should be clear on what you mean by operating system, or rather, what level of design you are interested in. There is the sense of "OS design" which is embodied by most good university OS classes, and books like Tanenbaum's _Modern Operating Systems_. I.e. there are many nitty-gritty primitives that a given distributed operating system requires depending on its goals. E.g. distributed deadlock detection/avoidance, many of Leslie Lamport's contributions (including his seminal Distributed Clocks work), etc.

    At a larger scale, and as others have rightly mentioned, Plan 9 is one of the first major rethinks of fundamental OS design policies and goals. Unix has at its roots assumptions buried in a single large timesharing/batch system, with networking and thus distributed behavior stapled on afterwards. To whet your appetite, the X Window System is fundamentally irrelevant in the Plan 9 environment, except for legacy code. It is safe to say that the Plan 9 papers are required reading for your goals. Note that this really doesn't get into kernel level design -- the Bell Labs team freely admits that the kernel (at least pre-Brazil) was fairly conventional in design.

    Last but not least, don't fall into the trap of a Solution looking for a Problem. Don't try to use "AI" (no offense, but whatever the heck you mean by that -- it's so overbroad as be like saying "I'll solve it with Science!") when you don't even have a specific problem in the domain of distributed computing identified. Understand the real problems, which I'm guessing in your case are large-scale systems design and usability issues... THEN look for appropriate solutions.

    Good luck!

  21. Re:How I kinda envision it by grahamsz · · Score: 2

    Perhaps single user systems could be expanded by adding something like a distributed.net client to them, one which would accept work blocks from other clients on the lan.

    That way when I ask photoshop to rotate a 4096x4096 image by 37.241 degrees it checks the lan for free machines and splits the task up and deals it out.

  22. Mozart by Baldrson · · Score: 3
    What do you think some of the major concerns/design issues are? I'm talking about nuts and bolts...

    Many of the important theoretic issues have been addressed at the nuts-and-bolts level by the Mozart Programming System. Specifically, if you read Distributed Programming in Mozart - A Tutorial Introduction you'll have an idea of the kind of distributed programming power provided by a network of Mozart systems.

    The key to Mozart's power is its use of ultra-light-weight threads that can share single-assignment distributed variables within heirarchical computation spaces. What this means is you can have unlimited "processes" that are waiting on all sorts of things all over the network -- and failures are easily confined to the minimum logical spaces.

    By "ultra-light-weight threads" I mean a virtual unification of process structure with data structure.

  23. "divide processor time for a single task"? by Christopher+Thomas · · Score: 3

    I think it would be even cooler to have something that, given enough bandwidth, would transparently divide up processor time for a single thread/task.

    How exactly do you propose that the operating system do this?

    Unless the programmer or compiler parallelizes the code, you're out of luck for running it on more than one processor at a time. What is the OS supposed to do? Recompile it on the fly, adding all of the MT-safing, rebuild it, and hope that it's faster?

    Unless an application is designed from the start to be parallel, it can't be run as a parallel program.

    1. Re:"divide processor time for a single task"? by Amoeba+Protozoa · · Score: 2

      Well, I don't think it is possible now. But, what if your compiler had enough smarts to divide register level tasks umongst different processors-- in the worst case-- and vectorize tasks that look like vectors (FFT & the like)? Assuming the current trend of VLIW processors, it might not be too far off in the future that compilers do something like this at a local/SMP level soon through shared-memory.

      Of course, you would need a very fast network with low-latency logic in between to probably get a speed gain, unless most of your processing was on very large vectors.

      Just what I think, as outlandish as it may be. But wouldn't it be cool?

      -AP

    2. Re:"divide processor time for a single task"? by Christopher+Thomas · · Score: 2

      Well, I don't think it is possible now. But, what if your compiler had enough smarts to divide register level tasks umongst different processors [...]

      At _compile_ time, it's possible, though not always practical or beneficial, as I'd already stated.

      You were talking about doing it at _run_ time on binaries that weren't designed for multithreading/multiprocessing.

      There is a big difference between these cases.

      It's not impossible, but it's *very* difficult, and of questionable use in almost all cases (overhead for threading is high, for multiprocessing is higher, and for running on processors separated by substantial latency is prohibitive).

      As another poster pointed out, some compilers already do this at build-time, but that's about it. If you want your application to be easily parallelized, then write it to be multithreaded to begin with.

    3. Re:"divide processor time for a single task"? by sjames · · Score: 2

      Should I keep singing "Dream a little dream", and wait for a parallel MP3 encoder?

      That would be overkill in most cases anyway. It is quite simple to set up a beowulf as an MP3 encoder farm at the file level. Ripping a CD can be very fast that way. There would be almost no point in the extra work to do more fine grained parallel processing.

      For other tasks, automatic parallelization would be a big plus, but is a hard problem. It's made even harder because depending on the speed of the interconnect (anywhere from a local torus to a bunch of 28.8 dialups) the basic approach would be entirely different.

      That doesn't mean it won't happen, just that I'm not holding my breath waiting.

    4. Re:"divide processor time for a single task"? by Amoeba+Protozoa · · Score: 2

      Yes, at compile time. I figure these features are long enough off that the compilers would generate "parallelizable" code that the O/S or some firmware could broker off to different processors or boxes.

      It is just a matter of time (how long, who knows!) before we hit the wall on processor speed. It will have to be an intelligent solution that would reach the above aims. It may not be mine, but one thing is for sure: it probably won't be me implementing it!

      -AP

  24. Re:Please define by SamThePondScum · · Score: 2
    I believe what he is really talking about is a matter of scale. You are correct, an operating system really is just a convienent mechanisim for managing the resources of "your machine," but in this case I believe he wishes to expand the idea of what a person's machine is. (Not that such an expansion is really that innovative.)

    A CPU in a box that sits under your desk, manipulating the bits that you tell it to, is able to make certain assumptions that make writing the operating system easier. The challenge of writing an operating system that can operate across platforms--where, perhaps, not all machines are equally trustworthy, or maybe where some processors may disappear completly (how do you handle lost data efficiently?)--is still the same question ("how do you use these resources to get work done?"), but the answer isn't the same.

    You are correct in that being distrubted doesn't help manage resources--in fact, it's a pain. The advantage being distributed offers is in having the cycles available to get more/bigger stuff done.

    Now, to answer the original question:

    An AI would probably find use in such a system. It could conceivably be trained and/or learn to recognize, for instance, unreliable nodes in the system, and perhaps only distribute less important work to that node. Where the AI itself would run would be an interesting problem, and is really an extension of the question "is the distributed OS symmetric?" (Note that things like Seti@Home are /not/ symmetric, as it has a central "OS node" that dolls out work to other nodes, which then respond with answers. This is the same thing as a current day consumer OS that runs the OS on, say, just one CPU, and never runs any part of itself on any other CPU, even if they are idle.)

    An AI could be used in any number of other jobs that such an operating system might need to do (e.g. allocating memory, scheduling jobs, etc.), but really an AI--as I usually think of them, anyway--is probably overkill. The simple algorithms currently employed in traditional OS's are probably sufficient...but you never know. That's why it's an interesting question.

    :)

    --
    -- PondScum, SamThe
  25. Distributed OSes are here by Greg+Lindahl · · Score: 4

    There are several real, full-featured distributed operating systems out there. One good example is Legion. It gives you the illusion of running programs on your desktop, while they are actually running lord-knows-where. Yes, you often need a lot of network bandwidth to get good results. Depending on the exact details, you can run programs on other machines with either no or small modifications.

    Lest you think this has nothing to do with today's operating systems, the Linux desktop folks have started using Corba quite a bit to link things together. Well, Legion provides much more powerful, secure, and reliable ways to do the same thing, in a much more consistant fashion.

    1. Re:Distributed OSes are here by AlienSquid · · Score: 2

      the correct link is http://legion.virginia.edu/

  26. Distributed, but not too connected by Animats · · Score: 4
    There have been all too many "distributed operating systems" out of academia. Few if any have gone mainstream, for a number of good reasons.
    • There aren't many problems that really need one. SETI@Home and crypto problems need so little coordination that E-mail would be enough.
    • Clusters are easier to do Read In Search of Clusters, a philosophical book on why clustering beats tightly-connected systems. This was written in 1995, before clusters took over the web server industry, but it's more relevant today than it was then. And it's out in paperback now.
    • There seem to be no useful stops between shared-memory multiprocessors and clusters. Many efforts have been made to build machines with lots of processors and exotic schemes for interconnecting them. From the Illiac IV to the Ncube to the Transputer to the Monarch to the Connection Machine, they've all lost out to more vanilla architecture.
    • Writing tightly-coupled distributed applications is both hard and wierd. There have been many attempts to make it easier via language design, from T/TAL for Tandems to LINDA to Occam to single-assignment languages. Nobody uses that stuff. (Arguably some should; one big lack of C/C++ is a total lack of language support for concurrency.)
    • Networking bandwidth is high enough for clusters. So ordinary techniques suffice.
    It's one of those things that's hard to do and has a low payoff.
  27. Harder than we would wish by one-egg · · Score: 2
    Part of the problem is that distributed operating systems are much harder to do than we would wish (as are distributed applications). Napster isn't the answer, it's really just a specialized search engine combined with what boils down to a bunch of ftp servers.

    Load balancing? Easy to write, hard to make work well. You need to compare the cost of migration to the benefits of balancing, and you need to make decisions based on partial and outdated information. Many early systems thrashed because everybody would migrate to the idle processor, which then became overloaded, so everybody migrated somewhere else, etc.

    Speaking of migration, it's a mess. The only system I know of that implemented migration fully was Locus, out of UCLA. The trouble is that whenever a process has a dependency on or a hook into its environment, that connection must be migrated too. Open files, working directory, sockets, controlling tty, signals, process parent/child relationships, and many more details must be handled. Not fun, and the benefits turned out to be mostly minor (though I do recall writing a cool version of "find" that migrated itself to the machine that stored the current subtree as it ran).

    The issue of supporting distributed applications is generally considered to be separate from writing a truly distributed OS. Most of what a distributed application needs can be provided by a good communications library. To some extent, we're still learning exactly what such a library should have. What about SETI@home is specialized to it, and what's universal? I don't think we've completely figured it out.

    The following is a non-exhaustive list of major concerns and design issues that must be addressed in a distributed OS. We have fairly good solutions to some, but most have not yet been solved:

    • Process control. How much process migration is a Good Thing? How do you decide what machine to use to start a process, and when do you decide to migrate it to another?
    • Communication and synchronization. What facilities does a distributed application need? How do we make those easy to use?
    • Reliability. How do we deal with the inevitable machine failures?
    • Replication. What processes and data should be duplicated on different systems? Are you doing the replication for performance, for reliability, or both? How do you manage updates to replicated data? How do you keep replicated process synchronized?
    • Lack of global knowledge. How do you make decisions based on partial information?
    • Naming. What names to things have. Do you have a shared global namespace, or a private one? How do you resolve names? What do you do when people and objects move?
    • Scalability. How does the system behave when the number of computers/users/programs jumps by a factor of 10 or 100? (This is a place where Napster doesn't do real well.)
    • Compatibility. How do you support existing software? Do you run on only one kind of hardware, or many?
    • Security. Who gets to run on what machine?

    Finally, I should note that the list of projects at U of Arizona might appear to be complete, but it omits a lot of important projects. Four that jump to my mind are Locus and Ficus from UCLA (though the latter is more of a distributed filesystem than an OS), Coda from CMU (again a DFS, rather well-known to Linux folks), and of course the extremely important Network of Workstations work out of UC Berkeley, which led to Inktomi and Hotbot.

  28. Re:At least give a good reason. by maraist · · Score: 2

    >So instead my program runs fine then randomly crashes at the aforementioned line on code on some machines.
    >Since then I have promised myself never to do any serious development in C if I can help it.

    That is why you modularize your code and perform unit testing.. This sort of error will prevail in any sort of language. For a given language, there will always be problems that have complex solutions. At this point, you have to apply good programming practices and a bit of software engineering.
    That a language such as Java or Pascal alleviates many types of programming errors is good, but there are just as many minuses to these languages. It's an engineering decision as to which language is best suited for a given set of problems and developers.

    Personally I use Perl, but that's even more error-prone than C (with the exception of core dumps). Good coding practices are essential for this. (The benifit, of course, is rapid development time)

    --
    -Michael
  29. Not really. by jcr · · Score: 2

    NeXTSTEP had a number of features that people mistakenly took for a distributed OS, the way that many people assume that any GUI has an OO substrate.

    It was the case in NeXTSTEP, that you could log in to any NeXT machine on your LAN, and your home directory (including your preferences like audio volume, etc.) would follow you around.

    NeXTSTEP is also the OS where Zilla was developed (Zilla was the program that BeoWulf was copied from.) Richard Crandall developed Zilla, and used it to find the 13th Fermat number, among other supercomputing achievements, on the idle machines at NeXT's headquarters.

    -jcr

    --
    The only title of honor that a tyrant can grant is "Enemy of the State."
  30. Parallelizing during compilation. by Christopher+Thomas · · Score: 2

    The best you can hope for is that some day compilers will be really smart and parallelize things for you, but even then the effect would be very limited, I'd think.

    You can do this fairly easily for certain types of loop. It would be a straightforward extension of loop unrolling. Now, I don't think anyone's been insane enough to _do_ this to date, as the thread creation overhead would eat the speed gain for anything except a very long-running loop.

    Something like TransMeta's code morphing that profiles on the fly could in principle figure out where it's sensible to do this, but speed gains would be questionable except in very special cases.

  31. Re:Know your buzzwords by Mindwarp · · Score: 2

    Distributed OS's typically refer to Operating Systems that run on more than one discrete processing system. These systems typically provide a transparent process space across all processors, often allowing processes to assign/migrate across processors transparently.

    Just because your video card/hard drive/printer/whatever has a CPU and/or RAM inside it doesn't mean that the Operating System is running on it. These are just instances of a Standalone Operating System interfacing with peripherals containing processing power. Inter-process communication does not a Distributed Operating System make!

    As you said, know your buzzwords.

    --

    --
    The gift of death metal does not smile on the good looking.
  32. Re:Problem with current programs by quinto2000 · · Score: 2

    Most BeOS programs are highly multi-threaded due to the architecture of the OS and therefore the APIs. But in the real world, your local bus is going to be much faster than the network. ANd if it isn't, then you are gaining no speed advantage. The only possible advantage I see might be in creating a computer that could never crash-say if it mirrored information across multiple computers. Cool thought!

    --
    Ceci n'est pas un post
  33. Fair 'nuff... by Christopher+B.+Brown · · Score: 3
    My bad; yes, Hydra should be on the list, perhaps as the great-grand-daddy. I gather that the IBM AS/400 platform is based on Hydra, albeit with the advanced stuff hidden far from view so as not to frighten the accountants.

    The interesting part is that Legion provides tools that resemble some parts of CORBA, whilst Spring provided tools that grew into CORBA, whilst Sprite provided journalling and cache tools that are essentially what journalling and cache servers provide today.

    In a sense, what has happened is that an OS of the 1970s, Unix, has been shown sufficiently malleable that it could integrate in concepts from the research projects of the 1970s and 1980s.

    Unfortunately, the 1990s were not a terribly good time for OS research; sort of like The Very Long Night of Londo Mollari of the OS world. There was this minor problem of Microsoft "buying away" whatever serious OS researchers that they could...

    --
    If you're not part of the solution, you're part of the precipitate.
  34. Re:QNX by RoosterT · · Score: 2

    Yes, I have used QNX at work for several years. It is the most distributed OS that I know of for general purpose use. My experience is mostly with QNX4, but the new QNX Realtime Platform (aka Neutrino 2.0, aka QNX6)promises to add some new twists. Some cool features of QNX4:

    + All network filesystems available with //node#/ syntax with no extra configuration required.
    + So, it is quite acceptible to echo "Hello World" > //node#/dev/con1 (or some other device)
    + Send/Receive/Reply interprocess messaging is network transparent
    + I have run computers with 4 network cards with no problem. QNX load balances over all available links. It will also intelligently bridge packets between LANS.
    + Load balances between different media too (Ethernet, Token ring, FDDI, etc)
    + Memory protected microkernel architecture! 1.95us context switch on a P133

    I recommend checking out http://www.qnx.com/products/networking/
    No, I do not work for QNX, but I think the world would be a better place if more people used it :-)
    The new QNX RTP will be open source accept for the mikrokernel itself (12k code) I believe.

  35. Corporations == Rule Based System Gone Berzerk by Sydney+Weidman · · Score: 2
    __________________________________________________

    Who needs AI research when you have Harvard Business School?

    Yes, it's true, folks. We already have the Sci-Fi scenario at hand. Corporations are organic beings that operate on a very simple set of rules. The only problem is that we can't turn them off -- they'll just keep going until they've consumed all the planet's resources. Then they'll use people as a power source. We'll all be "coppertops".

    I would suggest that we seriously look at eradicating these beasts before they kill us all.

    __________________________________________________

  36. Erlang by David+A.+Madore · · Score: 2

    Erlang (developped by the Swedish telecom company Ericsson) is an Open Source distributed operating system that runs on top of a host OS such as Unix or MS Windows. Erlang is based on high-level language paradigms, which makes it refreshingly different from all these C-based OSes. I think it deserves to be better known.

    For a rather comprehensive list of operating systems, check out the OS review subproject of the Tunes project. Of course, since Tunes is The Ultimate OS, it is distributed also (its only disadvantage is that it (currently?) doesn't exist).

  37. Re:How I kinda envision it by grahamsz · · Score: 2

    Ok i'm exaggerating a little, but considering across the uni they probably have about 1000 such pii machines (not to mention numerous ultrasparcs and 5 macs) they probably aren't all that far off having the same power as teh t3e (67th fastest computer in the world).

    And to turn your analogy round, if one man can dig 10 holes in 1hr40, it does mean that 10 men can dig ten holes in ten minutes.

    Depends on the application :)

  38. Re:Know your buzzwords by Mindwarp · · Score: 2

    Point taken.

    --

    --
    The gift of death metal does not smile on the good looking.