Advanced Data Structures?
mdf356 asks: "It's been 5 years since I left graduate school and started designing and writing software for a living. After 5 years of writing operating systems code, I feel like I've forgotten some of the more advanced data structures I used to know. The next time an interesting problem arises, I'd like to have more in my toolbox than hashes, linked lists, heaps and various binary and n-way trees. I'd like something short and sweet, more in the line of the standard C book. Algorithm Design by Kleinberg and Tardos looks likely to be too basic, but I haven't read it (I'd like to avoid paying $90 for something that won't meet my needs). CLR is far too large and almost exclusively covers basic territory. Tarjan's Data Structures book looks like it has potential, but seems focused on network algorithms, which are unlikely to be applicable to the kernel programming I do. What are some good reference books on more advanced data structures and algorithms, particularly ones with potential applicability to an operating systems kernel?"
Well, let's be fair, when you get to advanced data structures, it's kind of up to you to really do all the imagining that makes them great. Don't dismiss CLR too quickly, as the book has a lot of great concepts that, if you're serious about algorithms and data structures, are quite useful as a starting toolbox.
Honestly though, when I went through college, the very last thing they tried to shove down my throat wasn't advanced data structures but instead Design Patterns. I'm not talking about algorithm design but instead just designs in general. This book might not do it in C but the ideas are pretty language independent. It's your 'cookbook' of designs laid out in class diagrams and sometimes more documentation is provided. Anyways, what I would actually suggest is that you combine a book that covers C structures with book and what you would have is a way for you to make a set of complex classes that emulate a proxy. Or you have an adapter class that is a "advanced data structure." I guess what I'm trying to say is that I don't see the point of paying attention to advanced data structures when it's really the way these data structures interact that is complex and important.
The big problem is that, in order to make design patterns work for you, you have to go through a lot of documentation and diagrams. Take this for example, you're writing an OS code and there's a part of the operating system that's going to be changing frequently depending on processor (I don't know a lot about OS development), let's say the scheduler. So what do you do? Do you just willy-nilly throw the scheduler and start going? Well, what would be best is that if you encapsulated the scheduler inside a package or library so that it could easily be exchanged for another one. You try to keep it small and modularize the classes and data structures that change from those that don't. Then, a new processor comes out that might demand a new scheduler and you open up this mini-project, edit/debug it and roll it out as something that just replaces the classes and libraries on all your deployed versions. Sounds like common sense, right? Well, for some it is and for some it isn't but to me this interaction is a complication of data structures and I think that the answer you're looking for isn't necessarily a book called "advanced data structures" just a few that explain how they work.
My work here is dung.
No need to keep re-coding hasing lists when you have access to generic algorithms.
Are you trying to code a linked-list, or are you trying to solve a problem?
I don't know the meaning of the word 'don't' - J
WTF? People who try to move cleverness in low-level critical code should be shot. Look, there's a time and a place for everything, but this is the wrong approach for the problemspace you're working in. You didn't say which kernel you're working with, but odds are extremely good that it already comes with a set of pre-defined macros or functions for handling data structures. Use them. They're much better tested and understood than the stuff you want to play with, and much less likely to make your codevelopers hate you.
I don't want to be a stick in the mud, but seriously, this just isn't the place to start experimenting.
Dewey, what part of this looks like authorities should be involved?
Linked lists are not necessarily a poor choice for a data structure. The cost of adding and removing elements from a balanced binary tree is completely wasted when you only need to add or remove from the beginning or end of a list. A linked list is a perfect data structure for a fifo or deque. All data structures have their advantages and disadvantages; your choice should depend entirely on your requirements.
Yeah, you pretty much eliminate all tree structures, and hence all graphs structures, once you demand better than O(lg n) performance.
The only way to improve on this in slower data structures is to use radix-style keying to cheat, so that your lookup keys tell you exactly how to find the value you're looking for. But this isn't really an "advanced" data structure, just an optimization of a basic one. (In fact, a lot of "advanced" data structures are just such application-specific optimizations; radix keying is obviously dependent on domain knowledge of your data set.)
Almost all interesting data structures (in terms of implementation; lists are good enough for 90% of cases) are trees, or sometimes graphs, and even those are rarely needed in the majority of applications. The exceptions are probably so specialized that you'll never find a single book to cover even an interesting subset. Your best bet at that point is a focused literature search (if you have access to article databases), or perhaps a Google.
You don't really need advanced data structures so much as advanced algorithms for using those data structures. The basic mechanisms for assembling and accessing a list, tree, graph, or hash table are basically the same for all algorithms. There's some clever work on, for example, lock-free data structures for concurrent programming that would probably be useful for a kernel programmer, but that's still not a matter of new data structures, but rather having full knowledge of the concurrency model of the multiprocessor you're working with, and optimizing for it, which is why it's mostly of interest to kernel programmers, and other people coding to the metal.
"But you wear it so well! "
:-)
...... Nothing, they are semantically equivalent to *(a + b) and *(b + a).
:-)
Well, we're all 'special' in our own unique ways.
To be frank, I wouldn't have been quite so harsh (ass-like, if you wish) if he hadn't just happened to make one of mistakes that proved exactly my point about the difference between cons cells and lists.
"Based on your argument C doesn't have arrays, just pointers with some syntactic sugar."
I'd actually agree with that.... A bit of trivia: in C, what's the difference between a[b] and b[a]?
Keep in mind that I'm not saying any of this is a bad thing: in the case of Lisp lists, the added flexibilty that comes from their representation can be tremendously useful. However, it does complicate things. Just as an example, consider what happens when you define a function that traverses a list. Since Lisp lists can't be identified as being finite until you traverse them, you need to either be willing to tolerate functions that don't end when passed certain kinds of 'lists' or include extra baggage to detect when they're in an infinite list. Lisp also doesn't make the guarantee that the last 'next list' in a list is even a list at all: another edge case that's introduced by the fact that Lisp doesn't have first class lists.
Something else to consider is that Lisp doesn't even have to have cons cells at all. There's nothing to keep them from being represented internally as structures, arrays, or even closures. In other words, not only does the LISt Processing language not have lists as a first class type, it doesn't even have to have their component cons cells as a first class type. Lists can entirely be implemented using types the original poster seems to believe Lisp doesn't even have at all. Which brings me to the second reason I was an ass in my response... my experience has been that blanket statements tend to oversimplify things and can ultimately lead to bad decisions. If this guy has really reduced Lisp to nested lists, Perl to hash tables, Embedded programming to arrays, and Databases to B-trees, I question his ability to think critically in any amount of depth about software engineering problems. Of course, he could also be the best developer on the planet... After all, this is Slashdot, so you do have allow for the possibility of incompletely thought through 'pissing contest' posts. Which is the third and final reason I was somewhat an ass.
At the very least the original poster has shown the presence of mind not to keep posting in this thread..