Understanding the Linux Kernel
Although isolated pieces of operating system internals are usually not difficult to understand, learning how a significant portion of a real OS works is a daunting task: there's a lot of code, some of it is complicated, and some of it operates under obscure assumptions that can be difficult to figure out by reading the sources. Two of the best existing books about OS internals have explained either a simplified but working OS (Tanenbaum's Minix book) or a real, but very small OS (Lions' book on Unix v6). Although these systems have the advantage of being easier to understand, there's an important reason why one might want to study Linux internals instead: Linux is currently relevant, it's likely to be around for a while, and any code you write can potentially be used by thousands of people the day after tomorrow. So, taking it as a given the a book about Linux internals is a good thing, how good is this one? Happily, it's very good - better than any previous such book that I've seen (Rubini's Linux Device Driver book is also excellent, but it has a limited scope).
Understanding the Linux Kernel is good for several reasons. First, the authors have included quite a bit of explanatory material that isn't specifically about Linux - it's the kind of thing one would find in a good undergraduate OS textbook. This helps the reader link explanations of pieces of code to the abstract OS functions that they implement. Second, the authors have chosen a good level of abstraction: core kernel algorithms are explained in text, supplemented with short code sequences (simplified to remove optimizations) for important routines. Flowcharts are used to explain components with complex control flow, and tables and other diagrams are used when appropriate. Finally, the book is well arranged and well written, and there's an auxiliary index at the end that maps symbols mentioned in the book to source code files.
There are a few things I don't like about this book. Most importantly, there is no discussion of the network stack. As the authors say, this is a subject for another book, but by leaving out one of the most interesting and relevant parts of the kernel they are limiting their audience. A second drawback of this book (and of any Linux kernel book) is that since it seems to take about as long to write a good book as it does to write a major version of the Linux kernel, as I write this review it's about to become obsolete - it describes Linux version 2.2. However, at the end of each chapter there's a short note about things that are done differently in version 2.4. This will help preserve the relevance of the book after 2.4 comes out and, maybe more importantly, it gives the reader a sense of what parts of the kernel are under active development and what parts have become mature and stable.
Although Linux is very much in the Unix tradition, many details have changed. For example, early Unix kernels used simple algorithms (such as linear searches) and fixed table sizes. Modern Linux kernels, on the other hand, avoid arbitrary limits on the numbers of many kinds of internal OS objects, do not use linear searches when the number of objects to be searched is potentially large, and use amortized algorithms in many places. In all parts of the kernel, any special knowledge about the way that OS services will be used is exploited in order to improve average-case performance. For example, the slab memory allocator makes use of the fact that kernels often allocate many objects of the same size in order to reduce memory fragmentation and to avoid creating hot spots in the data cache. These algorithmic optimizations are much more pervasive (and much more effective) than micro-optimizations such as tuning register allocation or packing flags into the bits of a memory word - they're what make Linux useful in large-scale server environments where high throughput is critical. However, they also make the kernel code quite a bit more difficult to understand.
Given this complexity, it seems reasonable to ask who needs to read this book and how well does it suit their needs. Three groups of people come to mind. First, potential kernel hackers will find this book to be a good overview of different parts of the kernel. Of course, for people like this a book is no substitute for lots of code reading, but it's a good start. Another potential audience is the group of people who need to understand the kernel in order to extract high performance from it; for example, authors of databases or network servers. This group's needs are well served by this book: the authors often point out why certain heuristics were chosen - this may help people whose applications have run afoul of a resource allocation policy that was designed to serve a different class of applications. Finally, computer science students interested in the internals of a real OS would do well to read this book. It would make a good supplement to a standard OS textbook in an introductory class on operating systems. However, Linux appears to be far too large to understand in its entirety in a single semester: classes that attempt to do this should use a teaching OS like Minix. To benefit from this book, readers should have knowledge equivalent to a couple of semesters of computer science: a basic understanding of programming, of the services an OS provides to user-level programs, and of the hardware mechanisms used by an OS.
This is a good book. The authors have cracked open a large collection of code that's currently very relevant. If they are in for the long haul and release revised books in a timely way, then this will likely become and remain the definitive explanation of Linux internals.
The web site for the book is here.
You can purchase this book at Fatbrain.
It is a wise man who knows what he does not know.
I read the internet for the articles.
If you are looking for a book that compares various OSes, something by Silberschatz or Stallings might be more your speed. I don't really find those authors as helpful in my work, though. Tanenbaum (who influenced Linus' initial design) wrote one called "Modern Operating Systems" which is pretty interesting; it may be the best of that genre. The BSD book and the Linux book are not comparable to Tanenbaum's book (or for that matter, to each other, really).
I own both books (and Tanenbaum's) and I have to admit that I find the BSD book most useful. I think that's because BSD was more "designed" and Linux is more a "big ball of mud" that just got to be how it is by evolution. The types of bugs I've seen over the years in Linux seem to reflect that, while the kernel and VFS design is quite clever (in an engineering sense), there are aspects pertaining to networking and filesystems (most notably) that only get addressed when they become a problem to someone who is willing to solve them (eg. Direct Server Return in 2.2.14). I'm not saying there is no design to Linux, but it is less pronounced than in *BSD. I do *NOT* think that ESR's Cathedral and Bazaar analogy is as apt as many other people seem to believe. Maybe it's just because BSD is easier to understand for me. And I work with it a lot. (and Linux... and Solaris... and NT/2K...) On the other hand, there is a much greater level of detail in Bovet and Cesati's book, line-by-line analysis of much of the kernel's data structures, which I am finding useful in hacking on it.
My bias:
I am an applications programmer, sysadmin, and network administrator, rather than a kernel developer or embedded systems programmer. So I am more interested in network and filesystem details than how certain atomic operations are implemented. I occasionally change little things here and there in the Linux kernel (almost never changing actual code semantics in any of the BSDs), but nothing earth shattering in either case. On the other hand, it is my job to *tune* kernels.
My comparison:
The Linux book seems to concentrate on internal data structures, how pieces of code are implemented, and why; more a tinkering than design perspective. The BSD book is a design-motivating-an-implementation book (as per the title). They're both useful books, but for very different reasons. At only $40, it's worth buying the Bovet and Cesati book for guidance if you hack on Linux AT ALL, especially on a laptop or a disconnected workstation (hopefully with all the HOWTOs and source installed on it!).
Remember that what's inside of you doesn't matter because nobody can see it.
The Win2K kernel (or NT for that matter) is hardly less complex than Linux. If anything, it is a great deal more complex. This book addresses a need which simply isn't present for NT/W2K installations, because 99.9% of them don't have source and can't change anything internal to their OS. Linux is very different, and you don't seem to understand why. This isn't a book for end users.
Remember that what's inside of you doesn't matter because nobody can see it.
Once you start tinkering under the hood, there's no turning back. There are a few things that I'd love to be able to do in C, but the way to do it isn't standard across platforms or compilers. I can figure it out, but I'd have to test it on *everything*, or only support a few platforms where I know enough about the internals.
:)
For example: let's say you have a function that gets passed a region of malloc()'ed memory. You want to know exactly how much memory you have to play with. That number is stored somewhere before the beginning of that pointer. malloc() allocates a little extra memory, writes some status info at the beginning, sticks the pointer in front of that, and returns the address to you. However, exactly where and what it writes are somewhat unspecified.
I'd love to have a function in C that did this for me, but alas, there isn't one. So what I have to do is seek back through before the pointer, hope everything is allocated, and look for something like status info. I've done this on Solaris and Linux, and they don't do it the same way.
I think stuff like this is fascinating, but using internal knowledge to write your programs can be dangerous; it must be done carefully to avoid breakage.
---
pb Reply or e-mail; don't vaguely moderate.
pb Reply or e-mail; don't vaguely moderate.
Kernel Hacking for Dummies.
These are my friends, See how they glisten. See this one shine, how he smiles in the light.
Actually, I found the kernel to be quite simple. I bought "Linux Core Kernel Commentary", and looking at the code, it was very simple. Other people have also mentioned that the kernel itself has no real impact on user-friendliness. Actually, good user-friendliness requires _much_ more complexity. Let me give you an example:
If you have a form that asks for a users country, then afterwards asks for their state/province, the user-unfriendly version would simply show the same form to everyone. The user-friendly version would look at the country, and then decide (a) whether the field is needed or not, and then (b) what that field is called in that location. So, the user-friendly version is actually much, much more complex. With simple code, the user does the work, with complex code, the user gets it easy. In Linux 2.4, auto PnP support was added. That increased user-friendliness, and also greatly increased the code complexity (they had to add a whole resource-management subsystem).
Engineering and the Ultimate
Free as in beer, maybe, but the restrictions placed on you if you take such a course can be rather onerous.
It can be rather difficult to legally find work with a Microsoft competitor, as you're "contaminated" with Microsoft IP.
DNA just wants to be free...
It is well known that this book contains fragments of code coverade under the insidious GNU Public License. If any of this code made it into our products it could force us to make the rest of the source for that product available.
We advise coders wanting to replicate the stability of LInux to look elsewhere. Reading this book can only harm your career at Microsoft!
... and today's pet project has
> One or two more options, and we have the next /. poll!
But is it worth CowboyNeal?
Okay, I'll just be grabbing my coat . . .
Geoff
I think I see a trend here. Maybe for them it really would be easier to muzzle the entire internet than to produce p
Whoa there. When you talk about Linux becoming "less sophisticated and more user friendly" that really has almost nothing to do with the internals of the Linux kernel. Also, to become more user friendly, the kernel will probably continue to become more sophisticated, not less - things like hot swapping PCMCIA cards and USB devices require a very sophisticated kernel to be user friendly.
But in general, user friendliness is more about the graphical desktop and applications: Gnome, KDE, KOffice, StarOffice, Evolution, etc. There's a lot of differences at that level, so it would be impossible to have a single user reference for the notional "Linux User Interface". But the GUI is not strongly tied to the kernel.
I think that it should be possible to have good referenece documentation for the kernel, probably as a two volume set - one for the networking stack, and one for everything else. Just because you think the kernel is too complex to understand doesn't mean that everyone else agrees... And if we don't try to document it and understand it, we will end up in a situation where a very small number of people have the technical skills to work on the kernel, and that would be bad for the Linux community. One of the things that keeps Linux on course is that people have the option to fork it - either because they want to experiment with something, or because they have a different problem to solve (RTLinux), or they think they really can do it better than Linus. If the kernel is too hard for anyone but Linus, Alan Cox, and 30 other people in the world to understand, then that option to fork isn't really there anymore, and that would be bad for the future of Linux.
User documentation, on how to use Linux, is also important but is a completely separate issue and is probably best done by documenting distributions. Like "How To Use Red Hat 7". Of course that book doesn't need to say anything about how the kernel works - users don't care, and they shouldn't have to.
Torrey Hoffman (Azog)
Torrey Hoffman (Azog)
"HTML needs a rant tag" - Alan Cox
One or two more options, and we have the next /. poll!
--
"It's tough to be bilingual when you get hit in the head."
- the network stack isn't covered
- it'll be out of date soon
The first complaint can be answered by pointing out that two other linux books out there, the core kernel commentary and linux internals, don't cover the network stack either; in fact there is another book that does just that and that alone. Many topics can't be covered in a book and like the reviewer admits, the authors say networking deserves a book of its own. They have no reason to worry about limiting "their audience". They did such a thorough job with this 700 page book that if they were to include networking - and do the same thorough job with that subject too - the book would bloat to 1400 pages and not get out the door until kernel 3.0.The reviewer's second complaint is even dumber. Since the kernel is a construction based in time (rather than something more eternal), descriptions of it are going to be outdated eventually. As will any other linux book out there, when a new version gets out there. Even the New York Times will be outdated tomorrow. Big fucking deal. The authors are tracking the 2.4 changes for the next version of the book - and they plan to keep a website for the book - so everything should be okay. Plus their notes of what's new in 2.4 should be all you need for the time being.
Sorry to vent, but I hate it when reviewers feel obligated to come up with flaws, just to make it seem like they did a super-penetrating read of the text. If you want to come up with flaws, try reading the book past the introduction. I'm sure you'll find a few legitimate ones at least.
If the API is any indication, I can only imagine what the implementation is like. Actually, I was looking at the bootloader code for Win95 the other day. Some guy had dissasembled it, and he kept refering to the code's author as "some crazy mother f*cker." Take a look here. A couple of the comments are hilarious (as is the code!)
A deep unwavering belief is a sure sign you're missing something...
I doubt it. Microsoft is obsessive about documentation. The DirectX docs, for example, are around 800 pages just for the API. Anything and everything about that API is in those pages. I have a hard time believing that the same level of documentation doesn't exist for the source itself.
A deep unwavering belief is a sure sign you're missing something...
Even a 70 minute CD will store you without compression. 80 min is nearly 700 meg.
A deep unwavering belief is a sure sign you're missing something...
Umm, the kernel code has nothing whatsoever to do with the system code. If you've ever looked at Win32, you'll know that it is nowhere near as simple as POSIX (with one Xception.) The fact that the userspace is convultated is due to crappy design of the userspace, not crappy code in the kernel. (The kernel code is a bit convultated, but most people say it is quite elegant. But IANAKE)
A deep unwavering belief is a sure sign you're missing something...
I wonder what it costs to look at the Windows source.)
For startes, the inability to ever code another OS again. (NDA's)
-Michael
-Michael
and I was thouroughly pleased. I agree with Timothy, I was dissapointed that the network stack was ignored... especially when so much time is devoted to various types of memory allocation. Unless they have a book cooking to cover this topic, I feel mystified by the networking code. However, it was very beneficial to see how the VFS works, and the chapter on the Second Extended Filesystem was very insightful and informative. Furthermore, I liked the treatment of the bootstrapping process in the Appendix; this was most helpful in understanding it so you can exact finer boot control. Now I want Writing Device Driver! I give it an 8.
Black holes are where the Matrix raised SIGFPE
Can be free, since Microsoft gives the source to some academic institutions.
At my school I've seen a notice a lab course, in which the NT sources are examined and extended to support experimental services and algorithms.
I wonder why there hasn't been a good book explaining at a high-level how the kernel works with the rest of the files in a particular distro. Of course, there would need to be diferent books for each distro, but for those of use who are still learning how it's all wired together it would be illuminating. Even a decent flowchart would be a nice tool.
Of course, if there is such a thing and I've overlooked it, please flame me.
Cheap shot...
{Body bgcolor="blue" text="white"} {center} {strong} A fatal exception error has occured in...
Hopefully I didn't put any [] around my words.
which goes into greater depth than most people have ever seen of the kernel source itself
I think I know what you were trying to say (maybe) - but Timothy, do you ever think to proof posts you read?
I highly recommend Understanding the Linux Kernel.
--
Scott Robert Ladd
Master of Complexity
Destroyer of Order and Chaos
All about me
Well, this seems interesting for most people enthusiastic about Linux, etc. Learning more about how things work internally can make it somewhat easier to diagnose/fix problems IMO.
About the windows source... hhm.. there must be some internal documents explaining things, I'm guessing. Or maybe not...
Moz.
see a Text Widget
Does anyone know if this book is someone more geared towards a laymon, maybe a guide rather then a reference?
;)
This book is definately more of a guide, however, it is detailed enough to serve as a general reference as well. (Detailed reference == see the code
If you mean TCP/IP illustrated V.2 concerning the BSD stack implementation, then yes, this book might be slightly better from a technical prerequisite standpoint.
You will want to know some basic CPU architecture information to get the most of this book. I.e. how CPU caches work, virtual/physical address mapping, etc.
Seems like a sweet book, I own the BSD devil book, while ocasionally i can decypher something out of it, it ususlly makes my head spin. Does anyone know if this book is someone more geared towards a laymon, maybe a guide rather then a reference?
Also i wish they would make a book that was like a "stroll through OS design" cover the differences between OS's, what choices they made, how it effects performance, scaleability, etc.. Linux 2.2 is pretty 0ld sk00l compared to Solaris and FreeBSD.
-Jon
Streamripper
this is my sig.
I wonder what it costs to look at the Windows source.
It probably costs you your sanity.
Actually, if you're interested in writing Linux device drivers, I'd highly recommend O'Reilly's other Linux hacking book, Linux Device Drivers (more info here). I used this book to write a relatively simple device driver for a device about two years ago. It was incredibly helpful. I assume most of the information is still relevant with today's kernels. This book, combined with the existing drivers for other devices, provided me everything I needed to know.
I recently purchased this book (about a month and a half ago) and am delighted in its breadth and clarity.
The authors layout the information they wish to present clearly, and every chapter is a refinement of these main areas of functionality.
The book is also sprinkled with a lot of code, so that you can see the concepts in action. There are also plenty of diagrams, which gives the book a feel similar to something R. Stevens would write. (like TCP/IP illustrated / Unix Network Programmin)
If this is the kind of thing your interested in, definately spin for a copy. It can be a bit hard of a read given its size and density, however worth the effort.