Slashdot Mirror


How to Build a Search Engine

CowboyRobot writes "Three years ago, former Infoseek developer Matt Wells decided to go solo and build his own search engine, Gigablast. In this article, Infoseek founder Steve Kirsch interviews his former employee about the process and challenges of creating a modern, scalable search engine. From the article: 'Search is a fiercely competitive arena, even though there are really only five Web search companies today: Google, Yahoo (Altavista/AlltheWeb/Inktomi), Looksmart (Wisenut), AskJeeves (Teoma), and Gigablast. It's a tight little community, and a lot of the people know and watch each other. Microsoft is also coming to the party, and everyone's a little bit nervous to see what it's bringing.'"

8 of 270 comments (clear)

  1. That list makes no sense by jonman_d · · Score: 5, Insightful

    I have to say, that list makes no sense. Maybe if you'd switch "Gigablast" with "MSN", you'd have a list of the some of the major search engines, but it sounds like this guy is just tooting his own horn (and without the proper credentials).

  2. Re:Lol by SphericalCrusher · · Score: 5, Insightful

    That sounds a lot like self-advertisement to me. And there are A LOT more than just five companies! Take MetaCrawler and DogPile for instance -- they aren't on his list.

    --
    "Instant gratification takes too long." - Carrie Fisher
  3. Competition, in this case... a good thing by Jtoxification · · Score: 4, Insightful

    We all win. With the increasing # of sites, content, web services, spam, popup attacks, and "please allow us to rape your computer" certificates to download, (that's the main reason I use Firefox when on Windows now: because you can't tell I.E. to not accept those damned installation certificates, nor block requests to change the homepage.) it becomes equally more difficult to find what you're looking for, especially when it's not something that everyone else looks for, via Google's site ranking technology. Because they fight to be the best, we get cool things like ftp searches, grep and regexp searching of dmoz.org , video, image, and music searches, even linux and bsd search-specific pages. gMail, Microsoft's entry, and now Gigablast are all rewards we get to reap from each company attempting to set its roots deeper into the Internet like weeds vying for the same piece of dirt. We are extremely lucky, but then I doubt more than a handful search engines will ever hold top ranks at one time, due to the fact that they are so specialized in what they do. Just hope Gigablast and Google don't decide to create new IM service, too.

    --
    --I gots 99 problems but a new machine ain't one!
    AMD! Asus! Whoot! 6 years!
  4. Searching from the server's perspective by no+longer+myself · · Score: 5, Insightful
    Having a webserver hobby, I see the search engines crawl through my site daily. Of course in the beginning they hungrily tripped through the pages, taking in as much as could be found. Of course as time went on it seemed like some of the search engines had a new method of just grabbing a page or two every hour or so. I imagine this was to prevent over-taxing my box, but it made the first glance at my logs look artificially inflated as if people were visiting the site instead of just a crawler working its way through... slowly and painfully.

    I'd just prefer it if search engines would have enhanced rules for the robot.txt file so a webmaster could tell them more specifically how they want to be searched.

    Yes, I know you can put in a delay between page searches, and you can deny access to parts or all of the site, and you can even tell some or all crawlers to take a flying leap, but I'd like to tell them at the front door, "Search on Wednesday, make it fast, do a thorough job, and don't come back for a week."

    Too much to ask, right?

  5. Uhm No by Tedium+Unleased · · Score: 5, Insightful

    Microsoft is also coming to the party, and everyone's a little bit nervous to see what it's bringing.

    Oh yeah real nervous. They're getting on the bandwagon late; too late to monopolize this particular free (as in shut the fuck up) service. If by some miracle they produce something 'threatening', it will be because it's good or because the others have slacked off.

    1. Re:Uhm No by Anonymous Coward · · Score: 3, Insightful

      yeah! they only do well if they are first, you know, like with excel, and internet explorer, and a graphical user interface.

  6. Microsoft Party Crashing by Nom+du+Keyboard · · Score: 3, Insightful
    Microsoft is also coming to the party, and everyone's a little bit nervous to see what it's bringing.

    Everybody knows what Microsoft is bringing. Well almost everybody. Okay, I'll spell it out:

    1: Bring lots of money.
    2: Buy out a competitor.
    3: Rename it Microsoft Search.
    4: Attempt to trademark the word "Search".
    5: Bind it tightly into Windows as an essential service.
    6: Don't get it right until version 3.0.
    7: Profit!

    --
    "It's the height of ridiculousness to say for those 9 lines you get hundreds of millions."
  7. What I'd Like To See In A Search Engine by Nom+du+Keyboard · · Score: 5, Insightful
    What I'd like to see in a search engine is a page kill or broken link feature to keep it current. If I click a link that is broken or vastly changed (e.g. the link to ancient Chinese pottery is now a porn site), that I could backup to the search results page and click a link to have them immediately re-crawl that page. I think it would make for better results, and am surprised that it's not already common.

    You my license my patent on this idea for reasonable terms in exchange for shares of your company's stock.

    --
    "It's the height of ridiculousness to say for those 9 lines you get hundreds of millions."