Slashdot Mirror


Understanding Search Engines?

An anonymous reader asks: "I guess by now we can be fairly certain that search engines are here to stay, and hence I'm trying to understand how the technology works. I'm not so much looking for a particular 'best' technology or implementation, but rather an overview of the different approaches and their trade-offs. Something that would teach me: which approach works in a distributed vs a centralized infrastructure; how different algorithms will perform on complete search words vs arbitrary sub-strings; or how mass storage (hard disk vs. solid state) affects implementation choices. For most mature technologies there is a host of 'overview' books and papers for my questions -- but I couldn't find anything on search engines. Where should I look? Are there any good books or papers?"

3 of 49 comments (clear)

  1. Learn math by 2.7182 · · Score: 4, Insightful

    SIAM Review had a survey article on different methods recently. You need to know linear algebra, combinatorics and probability

  2. Class by addaon · · Score: 2, Insightful

    Take a class on information retrieval from your local university.

    --

    I've had this sig for three days.
  3. Re:Related Question: by Flwyd · · Score: 2, Insightful

    Look at your page with Lynx.

    Web crawlers can't see the text in your images and weird HTML constructions can make it hard to parse the text back out. If your page content can be clearly expressed in plain text there's a good chance a search engine will know what you're talking about.

    As an added bonus, if a web crawler can read your pages so can blind users.

    --
    Ceci n'est pas une signature.