Computer Science Curriculum Using Linux?
I couldn't resist posting this question from Kris Warkentin: "I am helping a professor at my school develop some projects for a third-year Operating Systems course. I told him that Linux would be good for that sort of thing, both as an example and as fodder for development. It is a single term (13 weeks) and students in Computer Science, while competent, are not exactly experienced programming wizards like Alan Cox. So, the question is, does anyone know of any nice little Linux-based programming projects which would give a feel for the OS internals? Maybe some of you have actually taken a course where you wrote a device driver or something? Any ideas or suggestions would be welcome."
This is a real cool idea. Are there any other schools doing something like this with Linux?
We also use Operating Systems: Design and Implementation (2nd edition) (Tanenbaum and Woodhull) as a textbook which includes Minix media and goes into great depth about the Minix OS.
The Linux kernel may not be as well documented for educational purposes.
One of my OS courses involved loading BSD3.0 on a machine. The instructor gave us an overview of the internal structures and by the end of the course we had to make a significant modification to the kernel. The modification didn't necessarily need to extend the product in a meaningful way, it just had to interact with the kernel in a manner that demonstrated we had an understanding of the kernel. I.E. My project was to create a new FS that would encrypt/decrypt the data being written/read to this filesystem. Modifications were made to the utils such as mount to recognize the new FS and ask for passwords. The encryption algorithm was just XORing 1 to the data but the point of the exercise was not to design strong encryption but to show knowledge of the kernel internals. With Linux it would be easy to instruct the students on the structure of the kernel and how to extend it with modules. Have them write simple programs to drive printers through the serial port. Get them debugging kernel code. Once you are elbow deep in, new and more complex projects will present themselves such as different scheduling algorithms. Don't underestimate what the students might be capable of.
At my school we are using Linux for our OS class now. We tweeked the RT scheduler , and added a new type instead of just FIFO, it was similuar, but did allowed jobs to be prioritezed before they started excuting. Starting with the proc entry was a good place to get going from. The device Driver was also a very good project. The device driver we impletmted was just a message queue. IE a /dev/msg entry where you would open the device, write some string, and it was stored in the kernel, Then reads on the device would return the messages in the same order. Good Project. For the final project we are working on a file system. They dont think we can code well enough so its all similuated in user space, but hey Im gunna wedge into the kernel cause thats just uber cool. I would also recomend groups of 2 students, try and give 1 - 2 weeks for each project, and then do the filesystem last, give 4 weeks, and make it kernel space.
During my last term I took Intro to Operating Systems we used a set of source code from berkely and washington called nachos. It basically emulates a MIPS 2000 platform. It is set up to have around 5 or 6 projects. Multithreading, File systems, Multi-processing, Memory Management and others. There are ports of the code for BDS, Linux, Solaris etc.... I suggest you go here for ideas. http://www.cs.washington.edu/homes/t om/nachos/
Here are some of the things I've enjoyed doing
:)
as school here at Concordia (the university in
Montreal)
Writing a simple shell with support for pipes
and i/o redirection.
Writing a simple server-client applications (some
kind of echo or file transfer client)
Writing a program to dump a file to stdout by
reading the disk blocks directly.
Writing a rudimentary undelete for ext2.
Using semaphores to solve simple multi-thread
problems (one writer-may readers, cars on a bridge
etc)
IMHO, a number of smaller projects is more useful
than one large one because the students can find
what they like and then can maybe expand on it later when they become open source contributors.
Just look at any part of the system and ask "I
wonder how that works" and voila, a project will
hatch out of that slowly.
Good luck!
Dana
I just finished a course like this. We used RedHat
5.2 (with a nice, relatively simple 2.0.x kernel
that reduced the learning curve a bit). We had two
kernel programming assignments:
1) Add process scheduling groups.
We added a couple new system calls that allowed processes to create and then join new
scheduling groups. You could set priorities for group member processes and then any time one
member of the group came up, the highest priority TASK_RUNNING process in the group would
be selected to run instead. This lead to pretty useless behaviour but didn't involve anything
other than adding to the scheduler, so you didn't screw up the behaviour of non-group-member
processes.
2) Add a new in-RAM filesystem to the kernel.
We had to add a 128K in-RAM volatile (your data disappears when you unmount the fs) filesystem to
the kernel. This was nice because you didn't have to create any user-space tools (other than
your own version of mount). When you mounted one of these filesystems the kernel allocated
128K and created your filesystem on it. You could mount as many as you wanted and use them
just like any chunk of disk space. This was a great way to learn the basics of Linux' rather
cool VFS.
Neither of these projects was hugely difficult but
they weren't trivial either. We also had to write
some basic kernel functionality benchmarks and
compare Linux 2.0/Sparc (our systems) vs. some
Solaris/UltraSparc systems. That was interesting
as well. This was a great course, so long as you
liked alot of programming.
Have you actually _looked_ at the kernel sources?
:)
It's quite unapproachable. Not because the things that are being done there are particularly complex, but it's all over the place and almost completely undocumented/uncommented.
Great! Make the first assignment to write comments for some stable parts of the kernel - then submit patches to the tree!
in the Carnegie Mellon operating systems course (mostly taken by juniors/seniors, tho it isn't specifically a "3rd year" course), you don't start with any operating system.
... that's kind of the point :)
The course is taught on SPARC emulators, which run on (you guessed it) SPARCs and make the architecture a bit more manageable. But you write the operating system
It's not all that complex an operating system, nice and straightforward and unix-ish, but it's a hell of a lot for one semester. The course is done in project groups, and it has a reputation as about the hardest class out there.
I've had friends in the course get back after a week almost exlusively in the lab. They show up friday afternoon in a zombie like state...
--"What did you write this week in OS?"
--"Huh? Oh, inter-process data streams. You know, pipes"
--"Neat"
--"pipe pipe pipe! pipe! PIPE! PIPE!" [nervous sobs]
When I taught our (UNC-Charlotte's CSCI) graduate operating systems course, assmuming that the students had already received an undergraduate OS course (sadly, sometimes too hopeful of an assumption) which covered the core basics of memory management, process management, context swtching, and introduces the two-layer device driver approach (our undergraduate course uses the XINU book), I picked up where that course left off, covering more about device drivers, I/O descriptors and their interaction with system calls, the filesystem (on-disk implementations, kernel implementations, different implementations at different mountpoints), then finishing off with distributed systems. One large component of the course was reading the Linux kernel source code in order to see a "real world" implementation of the coding concepts discussed in class. I have aways been a critic of how too many CSCI courses focus solely upon writing projects, yet don't spend enough (or any) time having the students read non-trivial code. We wouldn't ask novelists-in-training, essayists-in-training, or poets-in-training to write more than we've asked them to read, would we?
Anyway, two series of projects accompanied the lectures and assigned code readings. The first was to design and implement a basic interactive shell, first with basic file redirection and piping, later adding redirection to TCP sockets. This project aimed at giving the students a taste of systems programming that they may not have otherwise received, plus hammering in the UNIX concept that read() / write() will work on any sort of descriptor, be it pipe, file, or socket; even without the knowledge / cooperation of the process doing the I/O. At the time of writing the projects, the students were to read though the kernel code which implements the major system calls that they were using in order to see what was really going on (or at least to get a general idea that it all wasn't magic -- it all boiled down to "C" source code somewhere).
The second project suite was the implementation of an inode-based filesystem, starting from the ground up. First write a simulated mini-SCSI bus that supported two types of devices (one with 512-byte sectors, the other with 4096-byte sectors, just to ward off assumptions at the inode/block management layer). Once that works, add an inode manager that can use one of the virtual SCSI disks. Lastly, add a directory services module on top of the inode manager, so that we can manipulate files, directories, and symbolic links.
Ultimately, the projects asked a good deal from the students, as that the majority of them had not written any multi-threaded OO systems that made use of message passing (over the SCSI "bus"), so not only did they get to simulate some kernel components, they also had to come up to speed with some relatively advanced programming designs. The folks who used C++ learned the hard way that (at the time) debugger support for multithreaded programs was, um, challenged. Folks who wrote in Java had a bit of an easier time. Depending upon the level of knowledge in your undergraduates, I would not recommend the filesystem project. The shell project, OTOH, would be applicable to either 3'rd/4'th year undergraduates or graduate students, as that it hits home on the core UNIX datastructure -- the I/O descriptor. If the students were to have root access to the boxes, then I would have them perhaps extend an existing kernel subsystem or to write a new driver given an existing one. What about a thorough examination of the Linux scheduler / context switching algorithm. Could they cut any fat from it, as the IBM JDK folks did? What about examining the timer system? What about implementing a new "toy" virtual device driver, such as /dev/random (not that it is a toy, but that it doesn't correspond to any single piece of hardware, per se), such as a simple message passing port? One process opens it up, writes to it, then closes, followed by another process opening it and reading from it. That would demonstrate upper-layer device driver interfaces, plus the issue of passing bulk data to/from user space, and why time spent memcpy'ing becomes a factor in I/O bound systems.
Oh yeah, one other thing. You might want to think about obtaining the source code for more than one OS kernel (say also a *BSD kernel or the Solaris kernel -- being at an institution of higher learning, you should be able to get the Solaris source code w/o charge) in order to have the students compare / constrast the different approaches taken.
Have fun with the course!
In my OS class (and at other schools like UC Berkeley, Duke, and Harvard) we used a package called NachOS. It runs on a MIPS emulator, and you write large chunks of the OS yourself. We had to write processes, system calls, filesystems, VM, schedulers, applications for the OS (the shell was just 5% of assignment 2). The final assignment is to write a couple different schedulers or other subsystem, then performance analyize the hell out of it, which was really interesting.
Granted this course has a reputation for being WICKED hard. The whole OS is multi-threaded etc etc, so you have to deal with all the fun race condition issues just like a real OS. Running on a simulator makes life much better for a couple of reasons. 1) crash/rebuild/restart/debug cycle is MUCH FASTER. 2) debugging real kernels w/o having two machines (for serial debugging) is not fun, plus you've got to have the machines for the students, which can be a pain. 3) come on, device drivers aren't the _interesting_ part of the OS, so using a system where thats already done is more useful.
I liked doing this better than what other people here have suggested. I think just writing a device driver is kinda silly. It's a reasonably straight forward project, not really a good thing to do in an OS course, having students working with all the important OS components is much more useful. Starting with Linux is not a very good idea because of the large code base, and from what I've seen it's not really the best code for students to read. I would recommend one of the BSDs if you really want to go with the whole OS paradigm, especially FreeBSD when McKusick comes out with "The Design and Implementation of the FreeBSD Operating System." A second OS course or a Graduate level one is a better place to have students dive into a real OS, at that point you know the background theoretics of OS work, and you've written a fairly large code base of your own. Then it becomes much easier for students to dive into a real OS and do some research.
For books I'd say the Tanebaum book (already mentioned here) and the 4.4BSD book are very good.
--Britt
No, really! Even though I am not well versed in kernel design, just flipping through the FreeBSD kernel code will teach you quite a bit about how the system works at a user level. The Design and Implementation of the 4.4BSD Operating System is an excellent resource to have handy when learning to program at the user level in Unix. If you use it, you will have a far greater understanding of how the kernel and libraries are handling the calls you make and you will quickly understand programming more.
Linux Kernel Internals 2e Beck, Bohme, Dziadzka, Kunitz, Magnus, Verworner Addison-Wesley 1998 480 pages ISBN 0-201-33143-8
Linux Device Drivers Rubini O'Reilly 1998 421 pages ISBN 1-56592-292-1
Linux Core Kernel Commentary Maxwell Coriolis Press 1999 575 pages ISBN 1-57610-469-9
Applied Operating System Concepts 1e Silberschatz, Galvin, Gagne Wiley 2000 840 pages ISBN 0-471-36508-4