Advanced Data Structures?

← Back to Stories (view on slashdot.org)

Posted by ryuzaki0 on Thursday October 5, 2006 @12:20AM from the mix-and-match-your-scalars-lists-and-hashes dept.

mdf356 asks: "It's been 5 years since I left graduate school and started designing and writing software for a living. After 5 years of writing operating systems code, I feel like I've forgotten some of the more advanced data structures I used to know. The next time an interesting problem arises, I'd like to have more in my toolbox than hashes, linked lists, heaps and various binary and n-way trees. I'd like something short and sweet, more in the line of the standard C book. Algorithm Design by Kleinberg and Tardos looks likely to be too basic, but I haven't read it (I'd like to avoid paying $90 for something that won't meet my needs). CLR is far too large and almost exclusively covers basic territory. Tarjan's Data Structures book looks like it has potential, but seems focused on network algorithms, which are unlikely to be applicable to the kernel programming I do. What are some good reference books on more advanced data structures and algorithms, particularly ones with potential applicability to an operating systems kernel?"

8 of 118 comments (clear)

Min score:

Reason:

Sort:

Canonical Terms of Academia by eldavojohn · 2006-10-05 00:21 · Score: 4, Insightful

Tarjan's Data Structures book looks like it has potential, but seems focused on network algorithms, which are unlikely to be applicable to the kernel programming I do.
You'd be surprised. Some of the very basic network algorithms are kind of universal. Because, let's face it, some operating systems revolve around message passing.

CLR is far too large and almost exclusively covers basic territory.
Well, let's be fair, when you get to advanced data structures, it's kind of up to you to really do all the imagining that makes them great. Don't dismiss CLR too quickly, as the book has a lot of great concepts that, if you're serious about algorithms and data structures, are quite useful as a starting toolbox.

Honestly though, when I went through college, the very last thing they tried to shove down my throat wasn't advanced data structures but instead Design Patterns. I'm not talking about algorithm design but instead just designs in general. This book might not do it in C but the ideas are pretty language independent. It's your 'cookbook' of designs laid out in class diagrams and sometimes more documentation is provided. Anyways, what I would actually suggest is that you combine a book that covers C structures with book and what you would have is a way for you to make a set of complex classes that emulate a proxy. Or you have an adapter class that is a "advanced data structure." I guess what I'm trying to say is that I don't see the point of paying attention to advanced data structures when it's really the way these data structures interact that is complex and important.

The big problem is that, in order to make design patterns work for you, you have to go through a lot of documentation and diagrams. Take this for example, you're writing an OS code and there's a part of the operating system that's going to be changing frequently depending on processor (I don't know a lot about OS development), let's say the scheduler. So what do you do? Do you just willy-nilly throw the scheduler and start going? Well, what would be best is that if you encapsulated the scheduler inside a package or library so that it could easily be exchanged for another one. You try to keep it small and modularize the classes and data structures that change from those that don't. Then, a new processor comes out that might demand a new scheduler and you open up this mini-project, edit/debug it and roll it out as something that just replaces the classes and libraries on all your deployed versions. Sounds like common sense, right? Well, for some it is and for some it isn't but to me this interaction is a complication of data structures and I think that the answer you're looking for isn't necessarily a book called "advanced data structures" just a few that explain how they work.

--
My work here is dung.
1. Re:Canonical Terms of Academia by Profound · 2006-10-05 02:11 · Score: 4, Insightful
  
  >> This book might not do it in C but the ideas are pretty language independent.
  
  Design patterns is a book about Object Oriented design, aimed at the C++/Java level of abstraction. In low level languages it is clumbsy to write them, and in higher ones, unnecessary.
2. Re:Canonical Terms of Academia by bill_kress · 2006-10-06 05:25 · Score: 2, Insightful
  
  So to say a pattern has a simpler implementation in another language means that you don't need to know it, understand it, understand when to use it and how?
  
  The patterns aren't about how to implement something--they are all trivial to implement, I've yet to see one that I didn't create and implement myself before the books were even concieved of.
  
  The big things about design patterns are that you have a common name for a pattern and some code smells to look for that indicate the need for a pattern. Because you have "Multi-dispatch polymorphism" available, did you use it? Did you know what smells indicated you should use it? Can you quickly communicate what it does without using the word "Visitor"? You using it in that paragraph communicated to me just what you were talking about, so you needed and used design patterns right there.
If you can use C++ I recommend the STL by Progman3K · 2006-10-05 00:37 · Score: 1, Insightful

No need to keep re-coding hasing lists when you have access to generic algorithms.

Are you trying to code a linked-list, or are you trying to solve a problem?

--
I don't know the meaning of the word 'don't' - J
KISS! by Just+Some+Guy · 2006-10-05 00:41 · Score: 2, Insightful

Tarjan's Data Structures book looks like it has potential, but seems focused on network algorithms, which are unlikely to be applicable to the kernel programming I do.

WTF? People who try to move cleverness in low-level critical code should be shot. Look, there's a time and a place for everything, but this is the wrong approach for the problemspace you're working in. You didn't say which kernel you're working with, but odds are extremely good that it already comes with a set of pre-defined macros or functions for handling data structures. Use them. They're much better tested and understood than the stuff you want to play with, and much less likely to make your codevelopers hate you.

I don't want to be a stick in the mud, but seriously, this just isn't the place to start experimenting.

--
Dewey, what part of this looks like authorities should be involved?
Re:Link lists? by aligature · 2006-10-05 00:48 · Score: 5, Insightful

Linked lists are not necessarily a poor choice for a data structure. The cost of adding and removing elements from a balanced binary tree is completely wasted when you only need to add or remove from the beginning or end of a list. A linked list is a perfect data structure for a fifo or deque. All data structures have their advantages and disadvantages; your choice should depend entirely on your requirements.
Re:What are you missing? by Anonymous Coward · 2006-10-05 03:40 · Score: 2, Insightful

Yeah, you pretty much eliminate all tree structures, and hence all graphs structures, once you demand better than O(lg n) performance.

The only way to improve on this in slower data structures is to use radix-style keying to cheat, so that your lookup keys tell you exactly how to find the value you're looking for. But this isn't really an "advanced" data structure, just an optimization of a basic one. (In fact, a lot of "advanced" data structures are just such application-specific optimizations; radix keying is obviously dependent on domain knowledge of your data set.)

Almost all interesting data structures (in terms of implementation; lists are good enough for 90% of cases) are trees, or sometimes graphs, and even those are rarely needed in the majority of applications. The exceptions are probably so specialized that you'll never find a single book to cover even an interesting subset. Your best bet at that point is a focused literature search (if you have access to article databases), or perhaps a Google.

You don't really need advanced data structures so much as advanced algorithms for using those data structures. The basic mechanisms for assembling and accessing a list, tree, graph, or hash table are basically the same for all algorithms. There's some clever work on, for example, lock-free data structures for concurrent programming that would probably be useful for a kernel programmer, but that's still not a matter of new data structures, but rather having full knowledge of the concurrency model of the multiprocessor you're working with, and optimizing for it, which is why it's mostly of interest to kernel programmers, and other people coding to the metal.
Re:dude by mschaef · 2006-10-09 01:19 · Score: 2, Insightful

"But you wear it so well! "

Well, we're all 'special' in our own unique ways. :-)

To be frank, I wouldn't have been quite so harsh (ass-like, if you wish) if he hadn't just happened to make one of mistakes that proved exactly my point about the difference between cons cells and lists.

"Based on your argument C doesn't have arrays, just pointers with some syntactic sugar."

I'd actually agree with that.... A bit of trivia: in C, what's the difference between a[b] and b[a]? ...... Nothing, they are semantically equivalent to *(a + b) and *(b + a).

Keep in mind that I'm not saying any of this is a bad thing: in the case of Lisp lists, the added flexibilty that comes from their representation can be tremendously useful. However, it does complicate things. Just as an example, consider what happens when you define a function that traverses a list. Since Lisp lists can't be identified as being finite until you traverse them, you need to either be willing to tolerate functions that don't end when passed certain kinds of 'lists' or include extra baggage to detect when they're in an infinite list. Lisp also doesn't make the guarantee that the last 'next list' in a list is even a list at all: another edge case that's introduced by the fact that Lisp doesn't have first class lists.

Something else to consider is that Lisp doesn't even have to have cons cells at all. There's nothing to keep them from being represented internally as structures, arrays, or even closures. In other words, not only does the LISt Processing language not have lists as a first class type, it doesn't even have to have their component cons cells as a first class type. Lists can entirely be implemented using types the original poster seems to believe Lisp doesn't even have at all. Which brings me to the second reason I was an ass in my response... my experience has been that blanket statements tend to oversimplify things and can ultimately lead to bad decisions. If this guy has really reduced Lisp to nested lists, Perl to hash tables, Embedded programming to arrays, and Databases to B-trees, I question his ability to think critically in any amount of depth about software engineering problems. Of course, he could also be the best developer on the planet... After all, this is Slashdot, so you do have allow for the possibility of incompletely thought through 'pissing contest' posts. Which is the third and final reason I was somewhat an ass. :-)

At the very least the original poster has shown the presence of mind not to keep posting in this thread..