Slashdot Mirror


Interview with Matthew Dillon of DragonFly BSD

JigSaw writes "Well-known FreeBSD/DragonFly/Linux/Amiga system hacker Matthew Dillon discusses a number of interesting points regarding where the BSDs are going, the status and goals of his latest project DragonFly BSD, the status of his innovative Backplane distributed database, his exciting plans to develop DragonFly into a transparently cluster-capable system implementing native SSI (Single System Image) which is something that no other operating system can do today, and more."

14 of 233 comments (clear)

  1. Re:Different threading model by Mr.+Darl+McBride · · Score: 5, Informative
    It looks like the gist of the threading model for Dragonfly is that threads all stay on one processor. I assume this is for user processes only, and that this isn't pervasive through the kernel?
    Nevermind, found an overview here.
  2. Re:Different threading model by Kaladis+Nefarian · · Score: 5, Informative

    No this is to do with kernel threads. The userland threading is the same as in FreeBSD 4.x atm, AFAIK. The idea is to keep the model simple, unlike in FreeBSD 5.x where they are having trouble keeping it all sane with their fine-grained mutex model. Have a look at the dragonfly.kernel newsgroup, in nntp.dragonflybsd.org for more details on the SMP model, Matt talks about it regularly earlier on.

    --
    * Several monkeys are here, playing banjos and wearing small hats.
  3. Something no other OS can do? by fmayhar · · Score: 5, Informative

    It's simply not true that "a transparently cluster-capable system implementing native SSI" is "something that no other operating system can do today." We were doing it at Locus in 1994 with SVR4 then with Tandem in 1996 with NonStop Clusters for Unixware. Now some of the same folks at HP have introduced OpenSSI, which is essentially the same code, less all the Unixware-related bits, ported to Linux and placed under the GPL. They are coming up hard on their 1.0 release, which is not bad for five people and such a large task.

    OpenSSI is the real thing, it has processes that migrate from node to node, distributed file systems, the works. And it's running now on clusters literally all over the world. (Not many clusters, true, but maybe that will change if the Slashdot crowd finds out about it.)

    I'm happy to say that there's a lot of my code in that system, as well.

    I know a little about what Matt wants to do with his SSI in Dragonfly, but he should certainly take a look at OpenSSI; we had to solve a lot of the problems you run into when you build such a beast.

    (And a beast it is. As complex as a kernel can be, when you have what is essentially a distributed kernel across several nodes, the complexity goes up by orders of magnitude. Makes tracking down those weird hangs pretty exciting, in a painful, time-consuming kind of way.)

  4. Re:SSI? by Kaladis+Nefarian · · Score: 5, Informative

    If you read the article, Matt says (about SSI): "It is something that no non-commercial system today can do"...

    --
    * Several monkeys are here, playing banjos and wearing small hats.
  5. Re:Different threading model by m.dillon · · Score: 5, Informative
    Not exactly. All this means is that threads do not migrate preemptively, nor do they migrate while blocked or switched out while in kernel mode. Threads only migrate if (a) the thread itself wants to move to another cpu or (b) the thread is returning to user mode and the userland scheduler decides to migrate the thread to balance the load out (which only applies to threads associated with user processes since no other type of thread can 'return to usermode').

    Kernel threads almost universally stay on the cpu they were originally assigned to. High performance threaded subsystems, such as the network stack, are replicated. That is, the network stack creates multiple threads (one per cpu) and those threads do not migrate because, obviously, they do not need to.

    Generally speaking, the purpose of making thread migration explicit instead of automatic is to partition a larger data set across available cpu caches rather then cause the same data to be shared amoungst all cpu caches. The processors operate a lot more efficiently and SMP scales a lot better. Most people do not realize the horrendous cost of moving threads between cpus because the cache mastership change is invisibly handled by hardware, but the cost is still there and still very real.

    -Matt

  6. Re:Michael by Waffle+Iron · · Score: 2, Informative
    I think it was intended to be a complement, as in:

    "The three chief virtues of a programmer are: Laziness, Impatience and Hubris." -- Larry Wall

  7. Re:For a project that gets no press by Anonymovs+Coward · · Score: 3, Informative

    They do have ISOs, click the "download" link on their main page. The ISO is a liveCD, so you can boot your computer with it, like knoppix (no X or GUI stuff though). What they don't have is a friendly installer. But the /README file has detailed instructions on installing it to the hard disk, which should be easy if you have BSD experience, or if you're a brave newbie you can try it anyway.

  8. Re:Divide and conquer by Anonymovs+Coward · · Score: 4, Informative
    From that interview, it sounds like DragonFly is going to have a different package management system in the future. Which means either the base is going to change,

    The BSD base isn't packaged. BSD types like having a source tree for their entire base system and being able to do "make buildworld" and "make installworld" to upgrade it. The package management system is entirely for third party applications. This is not Debian or Gentoo who have no code maintained by themselves other than installation and package management stuff. The BSDs maintain the kernel, the libc, other key libraries, and all the base utilities like ls, cp, mount, etc. And there's also a lot of "contrib" software in the base system -- some of it necessary to build the system (gcc and binutils), some of it just there out of tradition or regarded as "too useful to be moved to ports" (bind, sendmail).

  9. Re:I guess that'll show em. by Rick+the+Red · · Score: 4, Informative
    I feel that from an administration standpoint with a large number of hosts it wouldn't matter if you were using RedHat, Gentoo, FreeBSD, OpenBSD, or any other *nix for that matter as long as the machines you were running were using the same distro.
    You haven't actually been an admin at a company with a large number of machines, have you? I worked for a large aerospace company and our Management (he wasn't even a PHB) wanted to know why we had an average of one admin for 20 machines when HP said one admin should be able to handle 200. Then HP explained that those 200 machines were absolutely identical -- same exact hardware, same exact OS patch level, and same exact applications. In the Real World, we had no two machines alike and thus needed the 1/20 ratio. And this was all the same brand of hardware and OS! Each department was different, which basically made vacation and illness backups a matter of "pray they don't call you." The admins who had the easiest time of it were those who worked on BSD boxes; the VR4 boxes were all over the map; even the users understood that if their admin was away, they were better off not bothering the backup on call for any more than password resets because they'd as likely break something else as fix your problem.

    Granted, if you ran an all RedHat shop or an all Mandrake shop things would be easier than simply an all Linux shop, but the same would be true for an all OpenBSD shop vs an all FreeBSD or NetBSD shop. But if each department is free to buy what they want I'd rather find who-knows-which-BSD on the box than who-knows-which-Linux.

    --
    If all this should have a reason, we would be the last to know.
  10. Re:I guess that'll show em. by yanestra · · Score: 3, Informative
    Merely my brief experience with Gentoo, when they first upgraded glibc (from 2.2 to 2.3 iirc) and broke half the packages, then downgraded it again and broke everything else. This is really a pet peeve: aren't minor versions supposed to be compatible?
    There's a big difference between binary executable and logical compatibility. You have compiled several files with newer libraries and then downgraded. You expected that the newly compiled programs to run with old libraries - which they in fact do, as long as there has no special feature of the newer libraries been used - inherently.

    If you are unhappy with your executables be broken, simply keep a copy of the older libraries. (With Gentoo, simply delete the old package file in /var/db/pkg before updating.)

  11. Re:Different threading model by m.dillon · · Score: 4, Informative
    The system core is lockless... there are no mutexes. For example, the LWKT scheduler core operates entirely without mutexes. Only critical sections are used to protect against local interrupts. A critical section is per-cpu, and really only needs to increment and decrement a per-cpu variable. As such the crit_enter()/crit_exit() code does not need to use any locked bus-cycle instructions or anything fancy at all, really.

    The LWKT scheduler on any given cpu is only allowed to operate on threads owned by that cpu. If you attempt to wakeup a thread owned by a different cpu, an asynchronous IPI message is sent to the target cpu's LWKT subsystem requesting that the specified thread be woken up. It's really that simple. Same goes for cross-cpu scheduling.

    IPI messages themselves are lockless and require no mutexes to operate because the cpucpu messaging uses a software crossbar (array of FIFOs) approach.

  12. Re:Different threading model by m.dillon · · Score: 4, Informative
    No, the critical section code simply increments and decrements a per-cpu variable. No locking is required at all... critical sections are local to the cpu, not global to the system.

    In regards to cache issues, lets say you have a quad opteron system. Each cpu has a 1MB L2 cache. If you migrate threads willy nilly you basically wind up in a situation where each of the four cpu's L2 caches contain the same data. In effect, you wind up with a system that globally has only a tad more then 1MB of L2 cache. If you partition data (such as TCP protocol data) across distinct threads, and place those threads on different cpus, then you are in effect partioning your system's memory across all four cpu caches and you wind up with a system that globally has 3-4MB of L2 cache instead of 1-2MB.

    There are two costs being saved here. (1) the cost of having to go to main memory when a piece of data is not in the L1/L2 cache, which can run into the hundreds of cpu cycles, and (2) the cost of cache mastership changes for all the data associated with the thread that was migrated (repeated each time the thread migrates).

    -Matt

  13. Re:Different threading model by m.dillon · · Score: 4, Informative

    I don't think it's anyone's design in particular, but I tend to sit down and write things from scratch rather then copy other people's ideas. In the case of the thread replication used by the network stack, it is primarily Jeffrey Hsu's work and since he is big on reading papers I'm sure it's a combination of his own design and ideas gleaned from various published papers.

  14. Re:Different threading model by m.dillon · · Score: 2, Informative
    It's an advantage in concept. The serializing token abstraction is a far easier model to program to because the semantics are totally different then mutexes. For example, when you obtain multiple mutexes you have to be very careful about lock ordering issues, and when you have one major subsystem holding a mutex call another major subsystem that might block, you have to (in FreeBSD-5) make the second subsystem aware of the first subsystem's mutexes, creating major code pollution all over the place.

    The serializing tokens used by DragonFly work differently. They only guarentee serialization while the thread holding the token is actually running. Other threads holding the same token will be allowed to run when the first thread blocks or switches away synchronously, and the original thread will not get the cpu back until the tokens it is holding are available again.

    This means that threads can obtain tokens in any order they wish, and that threads can hold tokens across blocking situations or calls to other subsystems without having to tell those subsystems about it. It may seem like a small thing, but the result is a huge simplification of the programming model. The tokens act almost like mini-BGL's (Big Giant Locks) but have the added advantage of protecting against interrupt threads trying to hold the same token. We are planning to expand the token idea further into a shared/exclusive model. The shared/exclusive model would have characteristics very similar to RCU.

    The actual internal implementation of our token code is also quite a bit more flexible, allowing us to rip the guts out and rework it as needed for performance without changing the abstraction.

    -Matt