Slashdot Mirror


Probing Hash Tables?

David Rusenko asks: "I've been taking a datastructures class at CMU as part of a summer CS program. One of these structures we have gone over is hash tables. After going through many different probing methods (linear, quadratic), multiple hash functions, and double hashing, I was all too curious to know if these are the best methods currently known. Some other interesting ideas came up, such as using the Fibonacci numbers for probing, but I haven't had time to test them yet. Any comments?"

3 of 48 comments (clear)

  1. Who cares? by Screaming+Lunatic · · Score: 5, Insightful
    It's a table. Which means it'll be faster than a list. That's all you have to worry about.

    What's that about a little bit of knowledge and it being dangerous. In school you'll beat every algorithm under the sun to death. In the real world you'll link to the STL and use a hash map.

    Write well-designed, clean, maintainable code. Then profile it. If your table lookup blips on the profiler, then *THINK* about optimizing it. After you've *THOUGHT* about optimizing it, then decide if it makes sense to squeeze the time and effort into the schedule.

    Random quotes:

    Premature optimization is the root of all evil. -- Knuth

    There is never a best solution, only tradeoffs to consider. --Eberly

    The best optimizer is between your ears. --Abrash

    The Ferrai's are only gravy, honest! --Carmack

    Alright I threw the last one in for shits and giggles. Don't blame me, I can't get through a Sunday night without drinking a poop load of beer. Anyway, the point is that there is a difference between *FAST* and *FAST ENOUGH*. If it's fast enough, who cares.

    [CYNICISM]

    Unless you're an academic and you're looking for funding.

    [/CYNICISM]

  2. Remain simple... by joto · · Score: 3, Insightful
    Remember that every complication to the algorithm makes it slower. When you move from linear to quadratic probing, you loose memory locality (think cache). When you move from quadratic probing to double hashing you do another function call (or at least, some calculation). A fibonacci scheme would be similar to quadratic probing, except it requires one more addition, I can't see immediately see any benefits to that, but maybe I'm not thinking hard enough about it...

    In the worst case, double hashing is better than quadratic probing, and quadratic probing is better than linear probing. But the trick for finding the "right" method is to ensure that the worst case never happens. Then you can avoid expensive advanced stuff, and keep it simple and lean.

    If you don't know much about your data, or your keys, or your insertion/removal pattern, than a separate chaining scheme might be best. If you know everything, then you can use a perfect hash function generator (but even that doesn't necessarily guarantee best performance). The reason they teach you several different methods is because there is no single "best" answer.

  3. Correction by splorf · · Score: 3, Insightful

    I better add before someone makes a tragic mistake, enlarge the table only if you're getting enough collisions to actually slow your program down enough to matter. If you have a lot of entries you'll get a few collisions even with an enormous table that's mostly empty, because of the birthday paradox. Don't worry about it til you're getting collisions on a significant fraction of your lookups.