Advanced Data Structures?

← Back to Stories (view on slashdot.org)

Posted by ryuzaki0 on Thursday October 5, 2006 @12:20AM from the mix-and-match-your-scalars-lists-and-hashes dept.

mdf356 asks: "It's been 5 years since I left graduate school and started designing and writing software for a living. After 5 years of writing operating systems code, I feel like I've forgotten some of the more advanced data structures I used to know. The next time an interesting problem arises, I'd like to have more in my toolbox than hashes, linked lists, heaps and various binary and n-way trees. I'd like something short and sweet, more in the line of the standard C book. Algorithm Design by Kleinberg and Tardos looks likely to be too basic, but I haven't read it (I'd like to avoid paying $90 for something that won't meet my needs). CLR is far too large and almost exclusively covers basic territory. Tarjan's Data Structures book looks like it has potential, but seems focused on network algorithms, which are unlikely to be applicable to the kernel programming I do. What are some good reference books on more advanced data structures and algorithms, particularly ones with potential applicability to an operating systems kernel?"

25 of 118 comments (clear)

Min score:

Reason:

Sort:

Canonical Terms of Academia by eldavojohn · 2006-10-05 00:21 · Score: 4, Insightful

Tarjan's Data Structures book looks like it has potential, but seems focused on network algorithms, which are unlikely to be applicable to the kernel programming I do.
You'd be surprised. Some of the very basic network algorithms are kind of universal. Because, let's face it, some operating systems revolve around message passing.

CLR is far too large and almost exclusively covers basic territory.
Well, let's be fair, when you get to advanced data structures, it's kind of up to you to really do all the imagining that makes them great. Don't dismiss CLR too quickly, as the book has a lot of great concepts that, if you're serious about algorithms and data structures, are quite useful as a starting toolbox.

Honestly though, when I went through college, the very last thing they tried to shove down my throat wasn't advanced data structures but instead Design Patterns. I'm not talking about algorithm design but instead just designs in general. This book might not do it in C but the ideas are pretty language independent. It's your 'cookbook' of designs laid out in class diagrams and sometimes more documentation is provided. Anyways, what I would actually suggest is that you combine a book that covers C structures with book and what you would have is a way for you to make a set of complex classes that emulate a proxy. Or you have an adapter class that is a "advanced data structure." I guess what I'm trying to say is that I don't see the point of paying attention to advanced data structures when it's really the way these data structures interact that is complex and important.

The big problem is that, in order to make design patterns work for you, you have to go through a lot of documentation and diagrams. Take this for example, you're writing an OS code and there's a part of the operating system that's going to be changing frequently depending on processor (I don't know a lot about OS development), let's say the scheduler. So what do you do? Do you just willy-nilly throw the scheduler and start going? Well, what would be best is that if you encapsulated the scheduler inside a package or library so that it could easily be exchanged for another one. You try to keep it small and modularize the classes and data structures that change from those that don't. Then, a new processor comes out that might demand a new scheduler and you open up this mini-project, edit/debug it and roll it out as something that just replaces the classes and libraries on all your deployed versions. Sounds like common sense, right? Well, for some it is and for some it isn't but to me this interaction is a complication of data structures and I think that the answer you're looking for isn't necessarily a book called "advanced data structures" just a few that explain how they work.

--
My work here is dung.
1. Re:Canonical Terms of Academia by tedgyz · 2006-10-05 00:47 · Score: 2, Interesting
  
  Well, let's be fair, when you get to advanced data structures, it's kind of up to you to really do all the imagining that makes them great. Don't dismiss CLR too quickly, as the book has a lot of great concepts that, if you're serious about algorithms and data structures, are quite useful as a starting toolbox.
  Good point. While it doesn't cover everything, it is hardly an "Introduction". I've been programming for 20 years (including kernels). This book fits the 80/20 rule - covering 80% of what I've needed. The "Trie" structure is one omission that comes to mind.
  
  --
  "No matter where you go, there you are." -- Buckaroo Banzai
2. Re:Canonical Terms of Academia by Profound · 2006-10-05 02:11 · Score: 4, Insightful
  
  >> This book might not do it in C but the ideas are pretty language independent.
  
  Design patterns is a book about Object Oriented design, aimed at the C++/Java level of abstraction. In low level languages it is clumbsy to write them, and in higher ones, unnecessary.
3. Re:Canonical Terms of Academia by bill_kress · 2006-10-06 05:25 · Score: 2, Insightful
  
  So to say a pattern has a simpler implementation in another language means that you don't need to know it, understand it, understand when to use it and how?
  
  The patterns aren't about how to implement something--they are all trivial to implement, I've yet to see one that I didn't create and implement myself before the books were even concieved of.
  
  The big things about design patterns are that you have a common name for a pattern and some code smells to look for that indicate the need for a pattern. Because you have "Multi-dispatch polymorphism" available, did you use it? Did you know what smells indicated you should use it? Can you quickly communicate what it does without using the word "Visitor"? You using it in that paragraph communicated to me just what you were talking about, so you needed and used design patterns right there.
data structures books by Binder · 2006-10-05 00:26 · Score: 4, Informative

Algorithms in C, Parts 1-5 (Bundle): Fundamentals, Data Structures, Sorting, Searching, and Graph Algorithms (3rd Edition) by Robert Sedgewick

The Art of Computer Programming, by Donald E. Knuth

Two seminal works on the subject. It really depends on your needs however. Nearly anything you would need can be built off of the subjects in those books, but for more specialized datastructures you will probably have to move to specialized books.
Link lists? by tedgyz · 2006-10-05 00:38 · Score: 3, Funny

You use linked lists in your kernel?!?

--
"No matter where you go, there you are." -- Buckaroo Banzai
1. Re:Link lists? by aligature · 2006-10-05 00:48 · Score: 5, Insightful
  
  Linked lists are not necessarily a poor choice for a data structure. The cost of adding and removing elements from a balanced binary tree is completely wasted when you only need to add or remove from the beginning or end of a list. A linked list is a perfect data structure for a fifo or deque. All data structures have their advantages and disadvantages; your choice should depend entirely on your requirements.
2. Re:Link lists? by jack_csk · 2006-10-05 00:58 · Score: 2, Funny
  
  Why are you surprised? He just forgot to tell you that he's working in Redmond, WA.
3. Re:Link lists? by tedgyz · 2006-10-05 07:02 · Score: 2, Interesting
  
  It was sarcasm. Back in the day, kernels mostly used fixed-size arrays. I guess I'm showing my age.
  
  --
  "No matter where you go, there you are." -- Buckaroo Banzai
KISS! by Just+Some+Guy · 2006-10-05 00:41 · Score: 2, Insightful

Tarjan's Data Structures book looks like it has potential, but seems focused on network algorithms, which are unlikely to be applicable to the kernel programming I do.

WTF? People who try to move cleverness in low-level critical code should be shot. Look, there's a time and a place for everything, but this is the wrong approach for the problemspace you're working in. You didn't say which kernel you're working with, but odds are extremely good that it already comes with a set of pre-defined macros or functions for handling data structures. Use them. They're much better tested and understood than the stuff you want to play with, and much less likely to make your codevelopers hate you.

I don't want to be a stick in the mud, but seriously, this just isn't the place to start experimenting.

--
Dewey, what part of this looks like authorities should be involved?
1. Re:KISS! by Mr2cents · 2006-10-05 02:01 · Score: 4, Informative
  
  Wikipedia has a nice collection of data structures. Maybe that's what you're looking for?
  
  http://en.wikipedia.org/wiki/List_of_data_structur es
  
  --
  "It's too bad that stupidity isn't painful." - Anton LaVey
Computer Algorithms / C++ by gregor_jk · 2006-10-05 01:16 · Score: 2, Informative

I was taught Computer Algorithms / C++ by Horowitz, Sahni and Rajasekaran( I believe they have a pseudocode version) and it was great. The seemed to be a perfect balance between the implementation and the ideas behind them.
Data Structures and Algorithms by kjhambrick · 2006-10-05 01:34 · Score: 5, Informative

Data Structures and Algorithms (ISBN 0201000237) by Aho, Ullman and Hopcroft is a classic that belongs on every programmer's bookshelf (hell, keep it on your desk!).
Re:What are you missing? by Greyfox · 2006-10-05 02:02 · Score: 4, Interesting

If you really want to, you can actually get by with just one data-structure:
And most programmers do. Read some code out in the industry and you'd think that 90% of the programmers out there never took a data structures class. Even if you're using a language like Java or C++ you really should know the implementation details of each so you can make informed decisions about the trade-offs in performance and maintainability. If you can't visualize the layout of your data and how you'll be accessing it, you really won't be able to pick the best structure for the job (Or roll your own if none of the usual ones fit.)
I can't think of a C project I've worked on where I didn't have to write a basic data structure like a stack or a linked list. On a couple of them the programmer passed data around as char data and used offsets to get at the data in them. On one, the programmer didn't even know about structs (Or null string termination.)
Having a "Convenient" language really isn't much of an excuse for not knowing this stuff. My data structures prof deliberatly chose an inconvenient language (BASIC) to show data structure implmentations. You can implement any structure in terms of array if you have to.
It'd be nice if more programmers would actually put some thought into their code before they go and start writing stuff. I wouldn't seek to discourage that...

--
I'm trying to teach myself to set people on fire with my mind... Is it hot in here?
Network algorithms by coyote-san · 2006-10-05 02:40 · Score: 2, Informative

Network algorithms are useful in an astoundishing number of places, especially in those cases where all flows are either 0 or 1. You just need to expand you idea about what the networks and flows represent.

E.g., my professor used network algorithms when writing the software that matches medical schools and students. Each school prioritizes it's candidate students, each student prioritizes his schools. The program finds the best matches. He said it had taken many hours (12 hours?) to do originally, but with the network flow it was a few minutes on an old kaypro.

Another example was determining when teams are mathematically eliminated from playoffs. It sounds like a simple problem, but it isn't since a team could lose a game yet remain in the running because a different team lost its own game.

A decade ago these algorithms would be overkill for a kernel. Today... I can definitely see how these algorithms could make a difference in some places.

--
For every complex problem there is an answer that is clear, simple, and wrong. -- H L Mencken
Re:What are you missing? by mschaef · 2006-10-05 03:05 · Score: 5, Informative

"Lisp uses only nested linked lists, that can be interpreted as binary trees."

As much as people keep saying it, this is totally false... Common Lisp does not even have lists as an intrinsic type. What it does have are cons cells, several kinds of numbers, symbols, characters, strings, several kinds of vectors and arrays, hash tables, structures, instances, and lexical closures among other types native to the language.

Lists are to Lisp what Strings are to C: a convention built upon more basic structures. In the case of C, a string is a convention built up on null terminated arrays of characters. In the case of Lisp, a list is a series of chained together cons cells.
Re:What are you missing? by Anonymous Coward · 2006-10-05 03:40 · Score: 2, Insightful

Yeah, you pretty much eliminate all tree structures, and hence all graphs structures, once you demand better than O(lg n) performance.

The only way to improve on this in slower data structures is to use radix-style keying to cheat, so that your lookup keys tell you exactly how to find the value you're looking for. But this isn't really an "advanced" data structure, just an optimization of a basic one. (In fact, a lot of "advanced" data structures are just such application-specific optimizations; radix keying is obviously dependent on domain knowledge of your data set.)

Almost all interesting data structures (in terms of implementation; lists are good enough for 90% of cases) are trees, or sometimes graphs, and even those are rarely needed in the majority of applications. The exceptions are probably so specialized that you'll never find a single book to cover even an interesting subset. Your best bet at that point is a focused literature search (if you have access to article databases), or perhaps a Google.

You don't really need advanced data structures so much as advanced algorithms for using those data structures. The basic mechanisms for assembling and accessing a list, tree, graph, or hash table are basically the same for all algorithms. There's some clever work on, for example, lock-free data structures for concurrent programming that would probably be useful for a kernel programmer, but that's still not a matter of new data structures, but rather having full knowledge of the concurrency model of the multiprocessor you're working with, and optimizing for it, which is why it's mostly of interest to kernel programmers, and other people coding to the metal.
Okasaki book by philgross · 2006-10-05 03:50 · Score: 2, Informative

A few years ago, I took Chris Okasaki's "Advanced Data Structures" course at Columbia. Looking at the Wikipedia Data Structures article others have mentioned, it's pretty good, but doesn't include, e.g. Pairing Heaps, which have excellent real-world performance. You might want to check out Chris's book Purely Functional Data Structures (ISBN 0521663504). Even though you're not working with a functional language, it's a good catalog of some recent data structures, and from what I saw in the class, he understands them at a very deep level.
Re:What are you missing? by 0xABADC0DA · 2006-10-05 04:55 · Score: 2, Interesting

I find it fascinating that a kernel developer would be concerned with O() rather than the actual runtime. I thought kernel developers in general just put an arbitrary upper bound so their "O(n^2) && n -lt 1024" was faster than the "O(log n)" for any n.

Now if what you are actually looking for is some fun how about embedding neko in the kernel, because while you can add some cool things like say tries (prefix trees) the real performance bottleneck is between apps and the kernel. Neko could safely call getpid() in the kernel 1000x more times than any user app could. This opens a lot of possibilites such as user code safely reading kernel structures directly with nearly zero overhead, functioning as event callbacks, etc.

As an example, take something like beagle where it ideally gets notified of every file change and then, at some later time re-index the files. Do you really want every file modification doing a kernel-user-kernel switch? Or some hideous complicated caching API to specifically send these events all in a batch? (inotify I am looking at you...). Think a generic BPF.
A complete book online for free by exp(pi*sqrt(163)) · 2006-10-05 05:11 · Score: 3, Informative

Purely Functional Datastructures by Okasaki.

--
Doesn't it make you feel good to know that our freedoms are protected by politicans, lawyers and journalists.
Re:What are you missing? by thsths · 2006-10-05 07:43 · Score: 3, Informative

> As much as people keep saying it, this is totally false... Common Lisp does not even have lists as an intrinsic type.

Funny how LISP stands for LISt Processing. And this thread is about data structures, not data types. The cons cell might be the central data type in LISP, but the nested list is the central data structure.

Just think about it. '(a b c) will be recognised as a 3 element list by any LISP programmer, and not as 2 cons cells plus 3 atoms (which would be equally correct). Now you can also construct binary trees with cons cells, but that is more complicated and therefore a bit rare.

Thomas
Re:What are you missing? by mschaef · 2006-10-05 09:00 · Score: 3, Informative

"Just think about it. '(a b c) will be recognised as a 3 element list by any LISP programmer, and not as 2 cons cells plus 3 atoms (which would be equally correct). "

Actually no... it's 3 cons cells. The syntax (a b c) is shorthand for (a . (b . ( c . ()))). This kind of misunderstanding is precisely why any competent (key word) LISP programmer understands the difference between lists and cons cells I was referring to in my original post. The closest structure with only two cons cells is this (a b . c).

Now, As long as we're being pedantic, what you actually wrote, '(a b c), is shorthand for (quote (a b c)). The numbers of cons cells and atoms in that case is a 'exercise for the reader', but suffice it to say that both your numbers are wrong.
byte-order independence by fortunatus · 2006-10-05 09:32 · Score: 2, Interesting

On a couple of them the programmer passed data around as char data and used offsets to get at the data in them.
Although mentioned in a negative light,
This technique is important when you have to move data between big-endian and little-endian processors. It lets you write portable I/O routines.
Re:dude by mschaef · 2006-10-06 02:11 · Score: 2, Funny

"you just made yourself look like an ass."

Dude, I was arguing on Slashdot about the number and proper usage of data types in Lisp, of all things... I think looking like an ass was pretty much a part of the deal from the get go.
Re:dude by mschaef · 2006-10-09 01:19 · Score: 2, Insightful

"But you wear it so well! "

Well, we're all 'special' in our own unique ways. :-)

To be frank, I wouldn't have been quite so harsh (ass-like, if you wish) if he hadn't just happened to make one of mistakes that proved exactly my point about the difference between cons cells and lists.

"Based on your argument C doesn't have arrays, just pointers with some syntactic sugar."

I'd actually agree with that.... A bit of trivia: in C, what's the difference between a[b] and b[a]? ...... Nothing, they are semantically equivalent to *(a + b) and *(b + a).

Keep in mind that I'm not saying any of this is a bad thing: in the case of Lisp lists, the added flexibilty that comes from their representation can be tremendously useful. However, it does complicate things. Just as an example, consider what happens when you define a function that traverses a list. Since Lisp lists can't be identified as being finite until you traverse them, you need to either be willing to tolerate functions that don't end when passed certain kinds of 'lists' or include extra baggage to detect when they're in an infinite list. Lisp also doesn't make the guarantee that the last 'next list' in a list is even a list at all: another edge case that's introduced by the fact that Lisp doesn't have first class lists.

Something else to consider is that Lisp doesn't even have to have cons cells at all. There's nothing to keep them from being represented internally as structures, arrays, or even closures. In other words, not only does the LISt Processing language not have lists as a first class type, it doesn't even have to have their component cons cells as a first class type. Lists can entirely be implemented using types the original poster seems to believe Lisp doesn't even have at all. Which brings me to the second reason I was an ass in my response... my experience has been that blanket statements tend to oversimplify things and can ultimately lead to bad decisions. If this guy has really reduced Lisp to nested lists, Perl to hash tables, Embedded programming to arrays, and Databases to B-trees, I question his ability to think critically in any amount of depth about software engineering problems. Of course, he could also be the best developer on the planet... After all, this is Slashdot, so you do have allow for the possibility of incompletely thought through 'pissing contest' posts. Which is the third and final reason I was somewhat an ass. :-)

At the very least the original poster has shown the presence of mind not to keep posting in this thread..