Probing Hash Tables?
David Rusenko asks: "I've been taking a datastructures class at CMU as part of a summer CS program. One of these structures we have gone over is hash tables. After going through many different probing methods (linear, quadratic), multiple hash functions, and double hashing, I was all too curious to know if these are the best methods currently known. Some other interesting ideas came up, such as using the Fibonacci numbers for probing, but I haven't had time to test them yet. Any comments?"
What's that about a little bit of knowledge and it being dangerous. In school you'll beat every algorithm under the sun to death. In the real world you'll link to the STL and use a hash map.
Write well-designed, clean, maintainable code. Then profile it. If your table lookup blips on the profiler, then *THINK* about optimizing it. After you've *THOUGHT* about optimizing it, then decide if it makes sense to squeeze the time and effort into the schedule.
Random quotes:
Premature optimization is the root of all evil. -- Knuth
There is never a best solution, only tradeoffs to consider. --Eberly
The best optimizer is between your ears. --Abrash
The Ferrai's are only gravy, honest! --Carmack
Alright I threw the last one in for shits and giggles. Don't blame me, I can't get through a Sunday night without drinking a poop load of beer. Anyway, the point is that there is a difference between *FAST* and *FAST ENOUGH*. If it's fast enough, who cares.
[CYNICISM]
Unless you're an academic and you're looking for funding.
[/CYNICISM]
In the worst case, double hashing is better than quadratic probing, and quadratic probing is better than linear probing. But the trick for finding the "right" method is to ensure that the worst case never happens. Then you can avoid expensive advanced stuff, and keep it simple and lean.
If you don't know much about your data, or your keys, or your insertion/removal pattern, than a separate chaining scheme might be best. If you know everything, then you can use a perfect hash function generator (but even that doesn't necessarily guarantee best performance). The reason they teach you several different methods is because there is no single "best" answer.
I better add before someone makes a tragic mistake, enlarge the table only if you're getting enough collisions to actually slow your program down enough to matter. If you have a lot of entries you'll get a few collisions even with an enormous table that's mostly empty, because of the birthday paradox. Don't worry about it til you're getting collisions on a significant fraction of your lookups.