Slashdot Mirror


Advanced Data Structures?

mdf356 asks: "It's been 5 years since I left graduate school and started designing and writing software for a living. After 5 years of writing operating systems code, I feel like I've forgotten some of the more advanced data structures I used to know. The next time an interesting problem arises, I'd like to have more in my toolbox than hashes, linked lists, heaps and various binary and n-way trees. I'd like something short and sweet, more in the line of the standard C book. Algorithm Design by Kleinberg and Tardos looks likely to be too basic, but I haven't read it (I'd like to avoid paying $90 for something that won't meet my needs). CLR is far too large and almost exclusively covers basic territory. Tarjan's Data Structures book looks like it has potential, but seems focused on network algorithms, which are unlikely to be applicable to the kernel programming I do. What are some good reference books on more advanced data structures and algorithms, particularly ones with potential applicability to an operating systems kernel?"

5 of 118 comments (clear)

  1. Re:Canonical Terms of Academia by tedgyz · · Score: 2, Interesting
    Well, let's be fair, when you get to advanced data structures, it's kind of up to you to really do all the imagining that makes them great. Don't dismiss CLR too quickly, as the book has a lot of great concepts that, if you're serious about algorithms and data structures, are quite useful as a starting toolbox.
    Good point. While it doesn't cover everything, it is hardly an "Introduction". I've been programming for 20 years (including kernels). This book fits the 80/20 rule - covering 80% of what I've needed. The "Trie" structure is one omission that comes to mind.
    --
    "No matter where you go, there you are." -- Buckaroo Banzai
  2. Re:What are you missing? by Greyfox · · Score: 4, Interesting
    If you really want to, you can actually get by with just one data-structure:

    And most programmers do. Read some code out in the industry and you'd think that 90% of the programmers out there never took a data structures class. Even if you're using a language like Java or C++ you really should know the implementation details of each so you can make informed decisions about the trade-offs in performance and maintainability. If you can't visualize the layout of your data and how you'll be accessing it, you really won't be able to pick the best structure for the job (Or roll your own if none of the usual ones fit.)

    I can't think of a C project I've worked on where I didn't have to write a basic data structure like a stack or a linked list. On a couple of them the programmer passed data around as char data and used offsets to get at the data in them. On one, the programmer didn't even know about structs (Or null string termination.)

    Having a "Convenient" language really isn't much of an excuse for not knowing this stuff. My data structures prof deliberatly chose an inconvenient language (BASIC) to show data structure implmentations. You can implement any structure in terms of array if you have to.

    It'd be nice if more programmers would actually put some thought into their code before they go and start writing stuff. I wouldn't seek to discourage that...

    --

    I'm trying to teach myself to set people on fire with my mind... Is it hot in here?

  3. Re:What are you missing? by 0xABADC0DA · · Score: 2, Interesting

    I find it fascinating that a kernel developer would be concerned with O() rather than the actual runtime. I thought kernel developers in general just put an arbitrary upper bound so their "O(n^2) && n -lt 1024" was faster than the "O(log n)" for any n.

    Now if what you are actually looking for is some fun how about embedding neko in the kernel, because while you can add some cool things like say tries (prefix trees) the real performance bottleneck is between apps and the kernel. Neko could safely call getpid() in the kernel 1000x more times than any user app could. This opens a lot of possibilites such as user code safely reading kernel structures directly with nearly zero overhead, functioning as event callbacks, etc.

    As an example, take something like beagle where it ideally gets notified of every file change and then, at some later time re-index the files. Do you really want every file modification doing a kernel-user-kernel switch? Or some hideous complicated caching API to specifically send these events all in a batch? (inotify I am looking at you...). Think a generic BPF.

  4. Re:Link lists? by tedgyz · · Score: 2, Interesting

    It was sarcasm. Back in the day, kernels mostly used fixed-size arrays. I guess I'm showing my age.

    --
    "No matter where you go, there you are." -- Buckaroo Banzai
  5. byte-order independence by fortunatus · · Score: 2, Interesting
    On a couple of them the programmer passed data around as char data and used offsets to get at the data in them.

    Although mentioned in a negative light,

    This technique is important when you have to move data between big-endian and little-endian processors. It lets you write portable I/O routines.