How to Build a Search Engine
CowboyRobot writes "Three years ago, former Infoseek developer Matt Wells decided to go solo and build his own search engine, Gigablast.
In this article, Infoseek founder Steve Kirsch interviews his former employee about the process and challenges of creating a modern, scalable search engine. From the article: 'Search is a fiercely competitive arena, even though there are really only five Web search companies today: Google, Yahoo (Altavista/AlltheWeb/Inktomi), Looksmart (Wisenut), AskJeeves (Teoma), and Gigablast. It's a tight little community, and a lot of the people know and watch each other. Microsoft is also coming to the party, and everyone's a little bit nervous to see what it's bringing.'"
I have to say, that list makes no sense. Maybe if you'd switch "Gigablast" with "MSN", you'd have a list of the some of the major search engines, but it sounds like this guy is just tooting his own horn (and without the proper credentials).
--
http://nemilar.net - Not your grandmother's soup kitchen
That sounds a lot like self-advertisement to me. And there are A LOT more than just five companies! Take MetaCrawler and DogPile for instance -- they aren't on his list.
"Instant gratification takes too long." - Carrie Fisher
We all win. With the increasing # of sites, content, web services, spam, popup attacks, and "please allow us to rape your computer" certificates to download, (that's the main reason I use Firefox when on Windows now: because you can't tell I.E. to not accept those damned installation certificates, nor block requests to change the homepage.) it becomes equally more difficult to find what you're looking for, especially when it's not something that everyone else looks for, via Google's site ranking technology. Because they fight to be the best, we get cool things like ftp searches, grep and regexp searching of dmoz.org , video, image, and music searches, even linux and bsd search-specific pages. gMail, Microsoft's entry, and now Gigablast are all rewards we get to reap from each company attempting to set its roots deeper into the Internet like weeds vying for the same piece of dirt. We are extremely lucky, but then I doubt more than a handful search engines will ever hold top ranks at one time, due to the fact that they are so specialized in what they do. Just hope Gigablast and Google don't decide to create new IM service, too.
--I gots 99 problems but a new machine ain't one!
AMD! Asus! Whoot! 6 years!
I'd just prefer it if search engines would have enhanced rules for the robot.txt file so a webmaster could tell them more specifically how they want to be searched.
Yes, I know you can put in a delay between page searches, and you can deny access to parts or all of the site, and you can even tell some or all crawlers to take a flying leap, but I'd like to tell them at the front door, "Search on Wednesday, make it fast, do a thorough job, and don't come back for a week."
Too much to ask, right?
Microsoft is also coming to the party, and everyone's a little bit nervous to see what it's bringing.
Oh yeah real nervous. They're getting on the bandwagon late; too late to monopolize this particular free (as in shut the fuck up) service. If by some miracle they produce something 'threatening', it will be because it's good or because the others have slacked off.
Everybody knows what Microsoft is bringing. Well almost everybody. Okay, I'll spell it out:
1: Bring lots of money.
2: Buy out a competitor.
3: Rename it Microsoft Search.
4: Attempt to trademark the word "Search".
5: Bind it tightly into Windows as an essential service.
6: Don't get it right until version 3.0.
7: Profit!
"It's the height of ridiculousness to say for those 9 lines you get hundreds of millions."
You my license my patent on this idea for reasonable terms in exchange for shares of your company's stock.
"It's the height of ridiculousness to say for those 9 lines you get hundreds of millions."