How to Build a Search Engine
CowboyRobot writes "Three years ago, former Infoseek developer Matt Wells decided to go solo and build his own search engine, Gigablast.
In this article, Infoseek founder Steve Kirsch interviews his former employee about the process and challenges of creating a modern, scalable search engine. From the article: 'Search is a fiercely competitive arena, even though there are really only five Web search companies today: Google, Yahoo (Altavista/AlltheWeb/Inktomi), Looksmart (Wisenut), AskJeeves (Teoma), and Gigablast. It's a tight little community, and a lot of the people know and watch each other. Microsoft is also coming to the party, and everyone's a little bit nervous to see what it's bringing.'"
I have to say, that list makes no sense. Maybe if you'd switch "Gigablast" with "MSN", you'd have a list of the some of the major search engines, but it sounds like this guy is just tooting his own horn (and without the proper credentials).
--
http://nemilar.net - Not your grandmother's soup kitchen
I mean, I know they're different sites and all, but isn't the yahoo site just the google search bar with all those category links added?
i never heard of them either, but heard of all others there.
what a load of shit, this guy works on one search engine, then compares his engine to the other top 4 competitor. What about alltheweb.com, for instance? I've at least heard of that one, it ain't there.
It's like Linux One (remember them) claiming there are four main linux distibutions. red hat, debian, slackware, and linux one.
I know that other people must use search engines other then google, but who? And why? I could see netscape, because it's the default homepage for many browsers, and maybe Ask Jeeve due to the easy syntax, but why would people go out of their way to Gigablast or Looksmart. Who's even heard of those two?
Apple has never claimed not to be evil, they're just very stylish about it.
That sounds a lot like self-advertisement to me. And there are A LOT more than just five companies! Take MetaCrawler and DogPile for instance -- they aren't on his list.
"Instant gratification takes too long." - Carrie Fisher
What about Lycos you insensitive clod? They're still around.
In the UK around the year 2000, they advertised Lycos on the TV. The advert featured a bagpiper who had a kilt and no underpants and asked Lycos to find some underpants. A dog then went off at great speed, and came back with underpants in his jaws, and then, the bagpiper could safely play the bagpipes when there were sudden gusts of wind. Anyway, just for fun, I typed in 'underpants', on Lycos and the first item it came up with was a pornographic website. However, this was lycos.com, and not lycos.co.uk which is what was advertised.
We all win. With the increasing # of sites, content, web services, spam, popup attacks, and "please allow us to rape your computer" certificates to download, (that's the main reason I use Firefox when on Windows now: because you can't tell I.E. to not accept those damned installation certificates, nor block requests to change the homepage.) it becomes equally more difficult to find what you're looking for, especially when it's not something that everyone else looks for, via Google's site ranking technology. Because they fight to be the best, we get cool things like ftp searches, grep and regexp searching of dmoz.org , video, image, and music searches, even linux and bsd search-specific pages. gMail, Microsoft's entry, and now Gigablast are all rewards we get to reap from each company attempting to set its roots deeper into the Internet like weeds vying for the same piece of dirt. We are extremely lucky, but then I doubt more than a handful search engines will ever hold top ranks at one time, due to the fact that they are so specialized in what they do. Just hope Gigablast and Google don't decide to create new IM service, too.
--I gots 99 problems but a new machine ain't one!
AMD! Asus! Whoot! 6 years!
Right now, one difference between Gigablast and Google is that Gigablast doesn't seem to index PDF files. This makes me sad, since I run a web site whose sole purpose is to serve up big PDF files.
There are also some minor usability problems compared to Google. If your search returns more than 10 results, you can't tell how many there are. You have to understand how to do "+keyword" and "-keyword" -- there doesn't seem to be a form you can fill in like Google's "advanced search" form.
It does seem to be pretty darn fast, though, and on the searches I tried, it gave reasonable results.
Find free books.
I'd just prefer it if search engines would have enhanced rules for the robot.txt file so a webmaster could tell them more specifically how they want to be searched.
Yes, I know you can put in a delay between page searches, and you can deny access to parts or all of the site, and you can even tell some or all crawlers to take a flying leap, but I'd like to tell them at the front door, "Search on Wednesday, make it fast, do a thorough job, and don't come back for a week."
Too much to ask, right?
Microsoft is also coming to the party, and everyone's a little bit nervous to see what it's bringing.
Oh yeah real nervous. They're getting on the bandwagon late; too late to monopolize this particular free (as in shut the fuck up) service. If by some miracle they produce something 'threatening', it will be because it's good or because the others have slacked off.
Everybody knows what Microsoft is bringing. Well almost everybody. Okay, I'll spell it out:
1: Bring lots of money.
2: Buy out a competitor.
3: Rename it Microsoft Search.
4: Attempt to trademark the word "Search".
5: Bind it tightly into Windows as an essential service.
6: Don't get it right until version 3.0.
7: Profit!
"It's the height of ridiculousness to say for those 9 lines you get hundreds of millions."
You my license my patent on this idea for reasonable terms in exchange for shares of your company's stock.
"It's the height of ridiculousness to say for those 9 lines you get hundreds of millions."