Advanced Data Structures?
mdf356 asks: "It's been 5 years since I left graduate school and started designing and writing software for a living. After 5 years of writing operating systems code, I feel like I've forgotten some of the more advanced data structures I used to know. The next time an interesting problem arises, I'd like to have more in my toolbox than hashes, linked lists, heaps and various binary and n-way trees. I'd like something short and sweet, more in the line of the standard C book. Algorithm Design by Kleinberg and Tardos looks likely to be too basic, but I haven't read it (I'd like to avoid paying $90 for something that won't meet my needs). CLR is far too large and almost exclusively covers basic territory. Tarjan's Data Structures book looks like it has potential, but seems focused on network algorithms, which are unlikely to be applicable to the kernel programming I do. What are some good reference books on more advanced data structures and algorithms, particularly ones with potential applicability to an operating systems kernel?"
Well, let's be fair, when you get to advanced data structures, it's kind of up to you to really do all the imagining that makes them great. Don't dismiss CLR too quickly, as the book has a lot of great concepts that, if you're serious about algorithms and data structures, are quite useful as a starting toolbox.
Honestly though, when I went through college, the very last thing they tried to shove down my throat wasn't advanced data structures but instead Design Patterns. I'm not talking about algorithm design but instead just designs in general. This book might not do it in C but the ideas are pretty language independent. It's your 'cookbook' of designs laid out in class diagrams and sometimes more documentation is provided. Anyways, what I would actually suggest is that you combine a book that covers C structures with book and what you would have is a way for you to make a set of complex classes that emulate a proxy. Or you have an adapter class that is a "advanced data structure." I guess what I'm trying to say is that I don't see the point of paying attention to advanced data structures when it's really the way these data structures interact that is complex and important.
The big problem is that, in order to make design patterns work for you, you have to go through a lot of documentation and diagrams. Take this for example, you're writing an OS code and there's a part of the operating system that's going to be changing frequently depending on processor (I don't know a lot about OS development), let's say the scheduler. So what do you do? Do you just willy-nilly throw the scheduler and start going? Well, what would be best is that if you encapsulated the scheduler inside a package or library so that it could easily be exchanged for another one. You try to keep it small and modularize the classes and data structures that change from those that don't. Then, a new processor comes out that might demand a new scheduler and you open up this mini-project, edit/debug it and roll it out as something that just replaces the classes and libraries on all your deployed versions. Sounds like common sense, right? Well, for some it is and for some it isn't but to me this interaction is a complication of data structures and I think that the answer you're looking for isn't necessarily a book called "advanced data structures" just a few that explain how they work.
My work here is dung.
Algorithms in C, Parts 1-5 (Bundle): Fundamentals, Data Structures, Sorting, Searching, and Graph Algorithms (3rd Edition) by Robert Sedgewick
The Art of Computer Programming, by Donald E. Knuth
Two seminal works on the subject. It really depends on your needs however. Nearly anything you would need can be built off of the subjects in those books, but for more specialized datastructures you will probably have to move to specialized books.
No need to keep re-coding hasing lists when you have access to generic algorithms.
Are you trying to code a linked-list, or are you trying to solve a problem?
I don't know the meaning of the word 'don't' - J
You use linked lists in your kernel?!?
"No matter where you go, there you are." -- Buckaroo Banzai
WTF? People who try to move cleverness in low-level critical code should be shot. Look, there's a time and a place for everything, but this is the wrong approach for the problemspace you're working in. You didn't say which kernel you're working with, but odds are extremely good that it already comes with a set of pre-defined macros or functions for handling data structures. Use them. They're much better tested and understood than the stuff you want to play with, and much less likely to make your codevelopers hate you.
I don't want to be a stick in the mud, but seriously, this just isn't the place to start experimenting.
Dewey, what part of this looks like authorities should be involved?
> I'd like to have more in my toolbox than hashes, linked lists, heaps and various binary and n-way trees.
You forgot arrays (very useful!), but maybe that is just too trivial. Anyway, what exactly are you missing? Most problems I know can be solved using these basic data-structures, or a combination thereof.
If you really want to, you can actually get by with just one data-structure:
Lisp uses only nested linked lists, that can be interpreted as binary trees.
Perl only really knows hashes. Arrays and lists exist, but the support is weak.
Embedded programs often use arrays only, sometimes extended with linked lists.
Databases use B-trees pretty much exclusively. They may not always be the fastest solution, but they perform well enough under a wide range of circumstances.
Of course you can go and use advanced graph theory. It may even have its benefit, but will the next person working on your code know what you are doing? Lets face it, most people are spoiled by "convenient" languages. They know array and dictionary, that's as far as most people every get.
Thomas
Not an encyclopedia, like CLR(S) or Knuth, but rather a tutorial of many useful data structures and algorithms. Doesn't have the breadth of the "encyclopedic" books but what it covers, it covers very well.
The drawback of this book, from your perspective, is that it isn't C-specific. It's usually transparent how to translate the code into C (or anything similar) but you can't just paste it into your code.
Am I part of the core demographic for Swedish Fish?
What type of "advanced data structures" are you alluding to? I cant imagine needing anything more complex than say, red-black trees , in kernel development.
And as someone already pointed out, use the predefined macros. Failing that you can always check out
http://lxr.linux.no/
And to who ever made the comment about using lists in kernel source... *sigh* probably not worth it.
I was taught Computer Algorithms / C++ by Horowitz, Sahni and Rajasekaran( I believe they have a pseudocode version) and it was great. The seemed to be a perfect balance between the implementation and the ideas behind them.
"Patterns" is a taxonomy of class types which frequently recur in object-oriented programming such a C#, Java, and C++. Classes which capture data state are pretty straight forward to work out. But its less clear how to organize classes which are more functional or control in nature and components of large software systems. Thats where the identification and study of class patterns is useful. There have been a dozen books on the subject.
You could have a look at Network Algorithmics:
i sciplinary-Designing-Networking/dp/0120884771
http://www.amazon.com/Network-Algorithmics-Interd
Data Structures and Algorithms (ISBN 0201000237) by Aho, Ullman and Hopcroft is a classic that belongs on every programmer's bookshelf (hell, keep it on your desk!).
Well, like another poster said, don't dismiss CLR too soon--in my experience, most problems could be solved by clever applications of one or more simpler data structures, rather than a really advanced one. Regardless, take a look at the lecture notes (scribe notes are decent quality) for 6.897 at MIT: http://theory.lcs.mit.edu/classes/6.897/spring05/l ec.html
They cover a good sampling of recent and important results, at least theoretically important (some data structures that achieve a log log n speed up over basic ones you'd obviously never want to implement)
Network algorithms are useful in an astoundishing number of places, especially in those cases where all flows are either 0 or 1. You just need to expand you idea about what the networks and flows represent.
E.g., my professor used network algorithms when writing the software that matches medical schools and students. Each school prioritizes it's candidate students, each student prioritizes his schools. The program finds the best matches. He said it had taken many hours (12 hours?) to do originally, but with the network flow it was a few minutes on an old kaypro.
Another example was determining when teams are mathematically eliminated from playoffs. It sounds like a simple problem, but it isn't since a team could lose a game yet remain in the running because a different team lost its own game.
A decade ago these algorithms would be overkill for a kernel. Today... I can definitely see how these algorithms could make a difference in some places.
For every complex problem there is an answer that is clear, simple, and wrong. -- H L Mencken
Skiena's "Algorithm Design Manual" is a useful compendium of algorithms and data structures. It's organized as a dictionary of design problems, with a survey of solution methods, hints about which ones work best under which circumstances, and pointers to source code.
While it doesn't go into detail about the algorithms and data structures, it lets you figure out which ones are best candidates to spend more time learning about.
Im suprized no one suggested wikipedia yet. To start try http://en.wikipedia.org/wiki/List_of_data_structur es
I wonder how much he is making from refering the books on amazon (4% of X sales from slashdotting) ?
Good idea, wish I thought of it first.
---- aut viam inveniam aut faciam
If you are looking for the best/craziest/most efficient data structures out there then you need to go through some of the computer science journals. It can be a royal pain to crawl through them looking for what you want, but that's where you will find all the "advanced" data structures. They also provide all of the information on how to use them, behavior, best uses and performance information on those structures, (including proofs if you care).
If you are just looking to refresh your memory, or learn a few more data structures, then the web is a good place to start. Just make sure to copy/save/take notes on what you find so you don't have to search when you need that cool data structure you read about 2 months ago that gave you O(1) performance in your situation. Reference books are also good, but most cover the same 80% so after 1 or 2 books you are spending a lot for just a few new pages.
Reading code is like reading the dictionary - you have to read half of it before you can go back and understand it.
A few years ago, I took Chris Okasaki's "Advanced Data Structures" course at Columbia. Looking at the Wikipedia Data Structures article others have mentioned, it's pretty good, but doesn't include, e.g. Pairing Heaps, which have excellent real-world performance. You might want to check out Chris's book Purely Functional Data Structures (ISBN 0521663504). Even though you're not working with a functional language, it's a good catalog of some recent data structures, and from what I saw in the class, he understands them at a very deep level.
"Algorithms and Data Structures in C++", by Leendert Ammeraal, Wiley, 1996. It is the only book I found that has good coverage of Tries.
"No matter where you go, there you are." -- Buckaroo Banzai
They aren't really arrays, but rather trees. They are really nice. They are cache-friendly without needing to be tuned for a particular size of cache.
If you make a few macros or inline functions for them, then using them everywhere becomes practical.
Purely Functional Datastructures by Okasaki.
Doesn't it make you feel good to know that our freedoms are protected by politicans, lawyers and journalists.
I've read many (around 15) data structures and algorythms books. Here is the best one that you will still be able to get in print: Tomes of delphi: algorithms and data structures Julian M Bucknall It covers advanced structures and walks you through improvements on algorythms. Basically it teaches you why and how some algorythms are most suited to various situations. Also, since it is in Pascal it is usually easier to port the code to more modern languages than C. The only drawback in this book is its binary tree algorythm, which I don't like and don't understand (his version anyway). Though I have several other books that cover that well, and since you are looking for more advanced stuff then you won't need that particular tree algorythm either.
"The Handbook of Algorithms and Data Structures" (in Pascal and C) Second Edition, by G.H. Gonnet and R. Baeza-Yates, ISBN 0-201-41607. It's a compact reference and analysis of a pretty broad range of algorithms. I also strongly second some of the other suggestions above such as Sedgewick, and Aho, Hopcroft, and Ullman. "The Stanford GraphBase" by Knuth, ISBN 0-201-54275-7 is great, though focused obviously. Beyond the basics, you'll end up looking in very specific books, such as "Text Compression" by Bell, Cleary, and Witten, ISBN 0-13-911991-4 or similar works applicable to the problem at hand. Some of the most interesting data structures I've seen I first encountered in journals. I can't seem to find it now, but it was in Journal of the ACM, if I recall correctly, I first learned a scheme to manage small sets that made membership-test, insertion, deletion, and finding the next member of the set all constant-time operations.
Where possible, of course, you don't want to implement these from scratch. You want to understand them so you know which apply to your problem, but then get code from, e.g. for C++, the Standard Library and Boost, both of which can also be a shopping list to spark further investigations.
'The Algorithm Design Manual' by Steven Skiena at Stony Brook is a very pragamatic book for intermediate data structures and algorithms. This is a definitely aimed at the practitioner who must get something up and running, without having to worry about big-Oh notation etc. There's an online version of the book but it's just so good, I'd recommend that anyone who needs this information, snap up a copy. Skiena also maintains The Stony Brook Algorithm Repository It's a treasure, as it includes implementations of some of these algorithms in a variety of programming languages. The web page does look somewhat dated, but as a good first step into intermediate data structures and algorithms.
Goodrich and Tamassia have a good general algorithms book. It's much shorter than CLRS, but it is also significantly more readable. They also have an "Algorithms in C++" and "Algorithms in Java" variants coauthored with others that have fairly substantial code samples.
The Algorithm Design Manual is pretty good.
// TODO: Insert Cool Sig
Although mentioned in a negative light,
This technique is important when you have to move data between big-endian and little-endian processors. It lets you write portable I/O routines.
Anyway, the Design Patterns book contains patterns for problems imminent in the class of languages it concentrates in. In different language classes different problems arise. Higher patterns than those in that book would be more poignant to a larger set of languages.
I was the real korpiq until I woke up clowned.
you just made yourself look like an ass.
Try also the Handbook of Algorithms and Data Structures:
r uctures-Pascal/dp/0201416077/sr=8-1/qid=1160102805 /ref=sr_1_1/102-2768899-0462565?ie=UTF8&s=books
http://www.amazon.com/Handbook-Algorithms-Data-St
If you're doing kernel development, you may want to look into (or indeed focus on) lock free and wait free data structures. I'm not sure if there are any books out on this, but there are some great resources on the web such as
;-)
http://works.music.columbia.edu/LFDS
as well as a mailing list at
http://tech.groups.yahoo.com/group/liblf-dev/
The group is developing an open source lock free data structures library - I'm just a lurker waiting for the library to become available
The posts so far at the Yahoo site are a gold mine of links to sites and papers.
It's OK Bender, there's no such thing as 2.
If you barely understand it and have to look it up in a book, you're doing the next guy to work on your code (and your organization) no favors.
Make stuff ONLY as complex as it NEEDS to be. That's the job of a good engineer.
If you're bored, perhaps you're struggling with the fact that 80% of what is done with computers today ONLY requires linked lists, hashes, and that dull "generic" stuff to get the job done.
No need to build the Golden Gate Bridge to span the 10 foot wide creek in your back yard. In fact, it's downright bad design.
Your boredom with those data structures isn't cause enough to abandon them, no more than we're all going to stop welding steel plates together when building large structures.
Engineering is engineering. Computer engineers need MORE structure and standards, not less by any means.
The truly outstanding algorythm geniuses also know WHEN to apply them, and then they have to see MASSIVE performance gains to make them worthwhile -- because they know their code is highly disruptive technology to those who don't understand it and might have to maintain it someday.
And the really amazing ones always mentor/teach a few people to understand it later to take over maintenance.
+++OK ATH
Agree entirely. Most of the code you'll encounter in your working career _will_ require the same "boring" structures, certainly in the business and financial sector. That's been my experience in 18 years of coding C, C++, C# and others. Yes I think Design Patterns are neat like everyone else and have used a few myself, but as the parent says you are not doing anyone else a favour if you've just included them for the sake of it, especially if you are retrofitting them into the bulk of existing code that happens to function just fine using 'boring' structures.
any thoughts on Judy arrays http://judy.sourceforge.net/
I think it is an interesting concept.. There are some concerns about it at http://www.nothings.org/computer/judy/
I think the concept is good.. Do you think it is a feasible solution for a highly scalable data structure?
Look at the ACCU book reviews, preferably by subject for books on Algorithms and Data structures (there's a lot of overlap).
I have had enough of the hashes and B-trees.. Any thoughts on Judy arrays http://judy.sourceforge.net/ I think it is an interesting concept.. There are some concerns about it at http://www.nothings.org/computer/judy/ . I think the concept it good.. Do you think it is a feasible solution for a highly scalable data structure?
At the risk of sounding like I'm trying to start a flame war, let me be the first to say how stupid it sounds for an intelligent person that has not READ a book to call it TOO basic for something like kernel programming.
Sounds to me like you're looking for a cookbook. Not to learn. Programming is an art and a science. You have to learn how to walk before you run and run before you fly. You cannot become a kernel programmer just by learning only those things you THINK are needed to program a kernel.
That's like saying you can build an airplane by learning to fly one. Get a grip, and do a little light reading.
Having had to teach a Lotus Notes developer how to do a simple sort, I can honestly say there is nothing more basically flawed than a lack of basic understanding.
Computer Science has been around a long time, as has kernel programming. Those things you are dismissing as basic are the very fundamentals of your trade.
Ante up, or move on. Or as Yoda put it, do or do not. There is no try.
I have the Kleinberg/Tardos book for one of my classes. You're right, it is pretty basic -- it does not really go into implementation, it is more about the theory behind certain algorithms and proving that they work. If you are looking more for different approaches to certain problems, and suggestions about implementing them, I would not suggest this book.