Advanced Data Structures?
mdf356 asks: "It's been 5 years since I left graduate school and started designing and writing software for a living. After 5 years of writing operating systems code, I feel like I've forgotten some of the more advanced data structures I used to know. The next time an interesting problem arises, I'd like to have more in my toolbox than hashes, linked lists, heaps and various binary and n-way trees. I'd like something short and sweet, more in the line of the standard C book. Algorithm Design by Kleinberg and Tardos looks likely to be too basic, but I haven't read it (I'd like to avoid paying $90 for something that won't meet my needs). CLR is far too large and almost exclusively covers basic territory. Tarjan's Data Structures book looks like it has potential, but seems focused on network algorithms, which are unlikely to be applicable to the kernel programming I do. What are some good reference books on more advanced data structures and algorithms, particularly ones with potential applicability to an operating systems kernel?"
"No matter where you go, there you are." -- Buckaroo Banzai
And most programmers do. Read some code out in the industry and you'd think that 90% of the programmers out there never took a data structures class. Even if you're using a language like Java or C++ you really should know the implementation details of each so you can make informed decisions about the trade-offs in performance and maintainability. If you can't visualize the layout of your data and how you'll be accessing it, you really won't be able to pick the best structure for the job (Or roll your own if none of the usual ones fit.)
I can't think of a C project I've worked on where I didn't have to write a basic data structure like a stack or a linked list. On a couple of them the programmer passed data around as char data and used offsets to get at the data in them. On one, the programmer didn't even know about structs (Or null string termination.)
Having a "Convenient" language really isn't much of an excuse for not knowing this stuff. My data structures prof deliberatly chose an inconvenient language (BASIC) to show data structure implmentations. You can implement any structure in terms of array if you have to.
It'd be nice if more programmers would actually put some thought into their code before they go and start writing stuff. I wouldn't seek to discourage that...
I'm trying to teach myself to set people on fire with my mind... Is it hot in here?
I find it fascinating that a kernel developer would be concerned with O() rather than the actual runtime. I thought kernel developers in general just put an arbitrary upper bound so their "O(n^2) && n -lt 1024" was faster than the "O(log n)" for any n.
Now if what you are actually looking for is some fun how about embedding neko in the kernel, because while you can add some cool things like say tries (prefix trees) the real performance bottleneck is between apps and the kernel. Neko could safely call getpid() in the kernel 1000x more times than any user app could. This opens a lot of possibilites such as user code safely reading kernel structures directly with nearly zero overhead, functioning as event callbacks, etc.
As an example, take something like beagle where it ideally gets notified of every file change and then, at some later time re-index the files. Do you really want every file modification doing a kernel-user-kernel switch? Or some hideous complicated caching API to specifically send these events all in a batch? (inotify I am looking at you...). Think a generic BPF.
It was sarcasm. Back in the day, kernels mostly used fixed-size arrays. I guess I'm showing my age.
"No matter where you go, there you are." -- Buckaroo Banzai
Although mentioned in a negative light,
This technique is important when you have to move data between big-endian and little-endian processors. It lets you write portable I/O routines.