Slashdot Mirror


How to Build a Search Engine

CowboyRobot writes "Three years ago, former Infoseek developer Matt Wells decided to go solo and build his own search engine, Gigablast. In this article, Infoseek founder Steve Kirsch interviews his former employee about the process and challenges of creating a modern, scalable search engine. From the article: 'Search is a fiercely competitive arena, even though there are really only five Web search companies today: Google, Yahoo (Altavista/AlltheWeb/Inktomi), Looksmart (Wisenut), AskJeeves (Teoma), and Gigablast. It's a tight little community, and a lot of the people know and watch each other. Microsoft is also coming to the party, and everyone's a little bit nervous to see what it's bringing.'"

270 comments

  1. Lol by SugoiMonkey · · Score: 5, Funny

    "even though there are really only five Web search companies today: Google, Yahoo (Altavista/AlltheWeb/Inktomi), Looksmart (Wisenut), AskJeeves (Teoma), and Gigablast " Gigawho? You silly goose.

    1. Re:Lol by Anonymous Coward · · Score: 0, Funny

      what kind of faggot says "silly goose"?

      Why, silly faggots, of course!

    2. Re:Lol by SphericalCrusher · · Score: 5, Insightful

      That sounds a lot like self-advertisement to me. And there are A LOT more than just five companies! Take MetaCrawler and DogPile for instance -- they aren't on his list.

      --
      "Instant gratification takes too long." - Carrie Fisher
    3. Re:Lol by Anonymous Coward · · Score: 0

      Giga, please. :)

    4. Re:Lol by great+throwdini · · Score: 1
      Take MetaCrawler and DogPile for instance -- they aren't on his list.

      Both DogPile and MetaCrawler are owned by InfoSpace. There may be more than five companies, but not as much diversity as one would think.

    5. Re:Lol by Nasarius · · Score: 3, Informative

      And they're not search engines. They're just meta-search engines that compile the results of Google, Yahoo, etc.

      --
      LOAD "SIG",8,1
    6. Re:Lol by ozmo · · Score: 0, Troll

      teehee,netzero said to me,....loozer! Hahahahaha...........

    7. Re:Lol by nametaken · · Score: 1

      I have a problem with most of them... they're paid for listings, if you're listing commercial sites. That's not a search engine, that's a company directory. It's one of the reasons to love google so much.

    8. Re:Lol by Anonymous Coward · · Score: 0

      Also, MetaCrawler and DogPile don't really run their own search engines, they just combine results from other search engines.

    9. Re:Lol by Distortal · · Score: 1

      Fair's fair - he does have a B.S. in Computer Science.

  2. Gigablast... by vosbert · · Score: 4, Interesting

    Am I the only one who's never heard of Gigablast... but then not too many years ago, I remember a time when I've never heard of Google. Kinda makes one wonder how secure a lead from its competition any search engine ever hope to obtain, and what kind of chances Microsoft stand in usurping the search engine market.

    1. Re:Gigablast... by Anonymous Coward · · Score: 0

      That's ridiculous.

      Gigablast is in no way going to usurp Google.

    2. Re:Gigablast... by Anonymous Coward · · Score: 2, Insightful
      Am I the only one who's never heard of Gigablast...


      i never heard of them either, but heard of all others there.


      what a load of shit, this guy works on one search engine, then compares his engine to the other top 4 competitor. What about alltheweb.com, for instance? I've at least heard of that one, it ain't there.


      It's like Linux One (remember them) claiming there are four main linux distibutions. red hat, debian, slackware, and linux one.

    3. Re:Gigablast... by Lord+Kano · · Score: 1

      Gigablast is in no way going to usurp Google.

      No this year at least. I remember when no one thought that a bunch of college student with this Google thing would be able to unseat Yahoo.

      LK

      --
      "Hi. This is my friend, Jack Shit, and you don't know him." - Lord Kano
    4. Re:Gigablast... by hords · · Score: 1

      Nope never heard of Gigablast. I guess the author of the post didn't read the recent Slashdot article about Amazon's new search engine.

    5. Re:Gigablast... by Anonymous Coward · · Score: 0

      I believe he lumped that into yahoo

    6. Re:Gigablast... by cliffy2000 · · Score: 2, Informative

      "Search is a fiercely competitive arena, even though there are really only five Web search companies today: Google, Yahoo (Altavista/AlltheWeb/Inktomi), Looksmart (Wisenut), AskJeeves (Teoma), and Gigablast."
      You don't even have to RTFA. Read the summary.

    7. Re:Gigablast... by platipusrc · · Score: 1

      That one's built on top of Google, too. That means that it's included by association, just like the ones that are built on top of Yahoo!.

      --
      And the muscular cyborg German dudes dance with sexy French Canadians
    8. Re:Gigablast... by mikis · · Score: 2, Informative

      A9 serves Google results, so you can't quite call them "search company". But I'm shure there are at least a dozen as big and famous as "Gigablast"

    9. Re:Gigablast... by funky+womble · · Score: 1

      But the URL format is so much quicker to type if you want to do a quick search from a machine with no fast way to search google.

    10. Re:Gigablast... by mikis · · Score: 1

      Yes, there is.In Opera, just type "g searchterm" as URL. In MSIE, go to Search / Customize / Autosearch settings and choose "Google Sites" and "Just display results in the main window", then you can type query in the address bar.

  3. Not *that* complicated by Anonymous Coward · · Score: 5, Funny
    This will cover about 50% of your job:
    select * from internet where keywords like '%asian sex free pics%';
    1. Re:Not *that* complicated by Anonymous Coward · · Score: 0

      Sweet! I'm heading over to the PTO to patent that right now...

  4. Hmmm.... by elid · · Score: 5, Interesting

    Gigablast: "273,384,720 pages indexed"
    Google: "Searching 4,285,199,774 web pages" That's quite a big difference.

    1. Re:Hmmm.... by ixplodestuff8 · · Score: 5, Informative

      I've never heard of gigablast either, but it seems to have some intresting features, it links to the wayback machine's page on the site so you can see past versions of the site. And it also says the most common phrases in which the search term was found. It also archives pages like google and goes as far as to link to OTHER search engines to help out your search

    2. Re:Hmmm.... by Waffle+Iron · · Score: 4, Funny
      Gigablast: "273,384,720 pages indexed"
      Google: "Searching 4,285,199,774 web pages" That's quite a big difference.

      At least this Gigablast name is closer to the truth. They are only exaggerating their page count by a factor of 3.7 : 1.

      By my math, Google comes up short by 2.3x10^90 : 1.

    3. Re:Hmmm.... by trenton · · Score: 3, Informative

      Have you tried searching, though? Google pulls back more (quantity adn accuracy) than Gigablast for the same terms. For example, search for "larry wall interview" and get 77,300 vs 9,759 . I'm certainly not saying Google doesn't have its share of problems (seems to steadily be declining in quality). And I do like the categories/tags that Gigablast provides, but overall quality I'll give to Google.

      --
      Too big to fail? Does that make me to small to succeed?
    4. Re:Hmmm.... by alib001 · · Score: 1

      I'd be more interested in seeing the numbers for the amount of pages indexed that were search engine "optimized" cruft.

      Look! We got lots of result for you! Yes... but most of the top twenty you have returned should actually be in the bottom twenty.

    5. Re:Hmmm.... by RzUpAnmsCwrds · · Score: 1

      ...

      And Yahoo Search returns 309,000 results.

      It's not the number of results, it's how they are arranged.

    6. Re:Hmmm.... by Timmmm · · Score: 1

      By my math, Google comes up short by 2.3x10^90 : 1.

      Except that google is a search engine, and a googol is 10^100.

    7. Re:Hmmm.... by Jugalator · · Score: 1

      Gigablast: "273,384,720 pages indexed"
      Google: "Searching 4,285,199,774 web pages" That's quite a big difference.


      Yes, and noticeable to me. I tried to search for a site I know, and regardless how many terms I entered, it didn't spot it... In the end, the results was down to 2 hits (with only three common keywords) and it wasn't among the sites.

      Heck, it doesn't put www.slashdot.org first when searching for Slashdot. :-P Actually, I couldn't even find a link to the main page when searching for Slashdot.

      --
      Beware: In C++, your friends can see your privates!
    8. Re:Hmmm.... by SpinyNorman · · Score: 1

      Yeah, but he's achieved that 1/20 of Google page coverage using only 8 computers, vs Google's thousands, and expects to exceed Googles page coverage this year by adding computers/disks/memory.

      Definitely one to watch.

      Google was the first to return mostly relevant results, but it definitely isn't perfect... For a start the page ranking algorithm naturally means that large well-linked commercial sites outrank the smaller more specialized or personal ones. Sometimes that's what you want, but sometimes it's not...

    9. Re:Hmmm.... by bluewhale · · Score: 1

      But that reaaly doen't matter.. Whats more important is the relevance of the first 70-100 links they bring. Most of the time, you just serach for something and don't even bother checking after page 2 or 3. The only time when I tried going all the way down to the 200th page was when I serached for my name and had to find my page the 212th page of links.. to be precise.. The size of the database matters only when you do the search. not when you bring up the results.. Accuracy is pretty much the only thing that concerns a user. Period.

    10. Re:Hmmm.... by Wolfier · · Score: 1

      4294967296 - 4285199774 = 9767522 pages to overflowage. Go Go Go!

  5. That list makes no sense by jonman_d · · Score: 5, Insightful

    I have to say, that list makes no sense. Maybe if you'd switch "Gigablast" with "MSN", you'd have a list of the some of the major search engines, but it sounds like this guy is just tooting his own horn (and without the proper credentials).

    1. Re:That list makes no sense by K-Man · · Score: 1

      He said "search engine companies", not search engines. Companies which do other things don't qualify. MSN, for instance, is affiliated with some company that makes computer mice.

      --
      ---- "If we have to go on with these damned quantum jumps, then I'm sorry that I ever got involved" - Erwin Schrodinger
    2. Re:That list makes no sense by Anonymous Coward · · Score: 0

      Then by that logic, since Boeing makes "missile defense systems", it cannot be an "airplane" company. The anti-MS crap runs so deep, this site loses a lot of credibility.

    3. Re:That list makes no sense by anno1602 · · Score: 1

      Well, as of yet, MSN is just using tweaked Altavista results, so MSN is not a search engine, just a portal site, and Microsoft is no a search engine company. That'll change, and he said so.

    4. Re:That list makes no sense by Anonymous Coward · · Score: 0

      You're whining about credibility, yet your comment remains in a nether-region unread by anyone who gives a rat's ass.
      When M$(do you like that?) builds a search engine that encompasses those of its competitors- as boeing has clearly done, invalidating that stupid fucking argument- then it can be called a "Search engine company," and you'll be free to write unread comments about other things.
      Nothing inherently interesting about this post of mine... just a smart troll putting down a dumb one.

    5. Re:That list makes no sense by K-Man · · Score: 1

      The issue is whether the company will bet the farm on search or not. Many companies build search tools of various kinds, but few try to do so as a pure play. MSN has recently refocused its efforts on search, but it hasn't historically tried to make money on that alone.

      --
      ---- "If we have to go on with these damned quantum jumps, then I'm sorry that I ever got involved" - Erwin Schrodinger
  6. Whatever happened to by nevek · · Score: 5, Interesting

    Hotbot, Lycos, Mamma.com, Iwon.com, wisenut.com, looksmart,com teoma.com, alltheweb.com, deja.com, direchit.com, excite.com, go.com, infoseek.com, invisibleweb.com, flipper.com, messageking.com, magellan.com, nbci.com, snap.com, northernlight.com, openfind.com, webcrawler.com

    ahh the dotcomfallout

    at least www.cowboynealsproncollection.com is doing well

    1. Re:Whatever happened to by Cyno01 · · Score: 1

      Dont forget dogpile, heh...

      --
      "Sic Semper Tyrannosaurus Rex."
    2. Re:Whatever happened to by Anonymous Coward · · Score: 0

      what is dogpile supposed to mean as a name, anyway?

      a pile of dogshit that you dig through, getting your already grubby paws all dog-shitty, to find search results?

      They work hard to find your results.

      -da tr0ll

    3. Re:Whatever happened to by cubic6 · · Score: 1

      Dogpile has to be the worst name for anything.

      --
      Karma: Contrapositive
    4. Re:Whatever happened to by fatphil · · Score: 1

      Very interesting question!

      hotbot is the same search engine as lycos.
      mamma is simply a meta search.
      iwon didn't serve me a page with any content.
      looksmart, wisenut, teoma and AllTheWeb were mentioned in the list above, I can only assume they're still active.
      deja.com is google.
      directhit.com is not a search engine
      excite refuses to serve me a page.
      go.com isn't a search engine, it's a portal with a google link
      infoseek is just a redirect to go.com

      Ayayay, that's enough already.

      FP.

      --
      Also FatPhil on SoylentNews, id 863
    5. Re:Whatever happened to by moartea · · Score: 1

      Hmm, I tried searching google for "www.cowboynealsproncollection.com" and all I got was this:
      Tip: Try Google Answers for help from expert researchers
      I didnt know google was into this kind of things...

    6. Re:Whatever happened to by Anonymous Coward · · Score: 0

      Yeah!
      And whatever happend to archie and gopher and veronica and jughead?

    7. Re:Whatever happened to by Anonymous Coward · · Score: 0
      at least www.cowboynealsproncollection.com is doing well
      Slashdotted. Anyone have a mirror up?
    8. Re:Whatever happened to by Anonymous Coward · · Score: 0

      www.cowboynealsproncollection.com No match for "WWW.COWBOYNEALSPRONCOLLECTION.COM".

      Thanks for getting my hopes up dick.

    9. Re:Whatever happened to by Rotting · · Score: 1

      deja.com was bought by google a year or two ago and the URL now brings you to google groups.

  7. How to Build a Search Engine? by Anonymous Coward · · Score: 0

    First, corner the name "Google."

    Second, work your fscking asses off for several years.

    Third, become an over-night success!

    1. Re:How to Build a Search Engine? by Anonymous Coward · · Score: 1, Funny
      Fourth, question-mark question-mark question-mark

      Fifth, Profit.

    2. Re:How to build a search engine? by Anonymous Coward · · Score: 0

      >> How to build a search engine?
      > I dunno. I better google it.


      LINKAGE ;)

  8. only 5? by micker · · Score: 5, Informative
    The poster left out vivisimo.... lately its been all I use...

    and to the post above this.. what does 2 trillion hits matter against 2 million if they cant get what you really need up onto the first page

    --
    Words are only yours until someone else uses them...
    1. Re:only 5? by tirenours · · Score: 1

      But Vivisimo, as great as it is, isn't a search engine. It is a meta-search engine. In other words, it searches search engines.

      Same thing for Dogpile, MetaCrawler and Kartoo.

    2. Re:only 5? by nacturation · · Score: 1

      and to the post above this.. what does 2 trillion hits matter against 2 million if they cant get what you really need up onto the first page

      It doesn't matter one bit. That is, until you want page 2 million + 1. Then, all of a sudden, having a few more billion pages to index is a good thing.

      --
      Want to improve your Karma? Instead of "Post Anonymously", try the "Post Humously" option.
    3. Re:only 5? by cyclobotomy · · Score: 1

      The more pages there are to search, the more likely the engine is to find the one you are looking for.
      Hits are different from index size.

  9. Humph by SlamMan · · Score: 4, Funny

    "and everyone's a little bit nervous to see what it's bringing.'"

    Money. Lots and lots of money.

    --
    Mod point free since 2001
    1. Re:Humph by LBArrettAnderson · · Score: 0

      I don't understand why a new search engine would make anyone nervous. Are they going to allow people to search for credit card numbers? bank account information? personal home addresses?

      That comment really baffles me... if it makes you THAT nervous, DON'T use it.

      I don't see how it could harm you if you don't use it... google's not going anywhere, even if another search engine takes over as "most popular"

    2. Re:Humph by Anonymous Coward · · Score: 0

      "and everyone's a little bit nervous to see what it's bringing."


      As well as money - _lawyers_. Lots and lots of lawyers.

    3. Re:Humph by Anonymous Coward · · Score: 0

      It's implied. Lawyers go wherever the money is.

    4. Re:Humph by Anonymous Coward · · Score: 0

      "and here they come around the bend! it's money followed closely by lawyers with jews falling back and the nazis right on their tails!"

    5. Re:Humph by Anonymous Coward · · Score: 2, Funny

      >> It's implied. Lawyers go wherever the money is.

      Not in every instance. Some lawyers suck so much the money comes right to them.

  10. P2P? by ron_ivi · · Score: 4, Interesting
    I always thought P2P would be a good infrastructure for a search engine.

    That way, I could share the load with people with similar interests as myself.

    For example, I would like a search engine that was more up-to-date crawling the PR of my competitors, but couldn't care less about most other companies. If I were running my own node of a P2P engine, I could set my node to focus on that, and anyone else who shared my interests could tap into it.

    1. Re:P2P? by lakeland · · Score: 3, Informative

      There was one a while back. Everybody installed a program kinda like glimpse on your server and indexed your own web site and a few others. IIRC it would automatically work out by IP address any sites that were nearby and not already over-indexed. They all then kinda pooled the results.

      One benefit of it is that you can keep the index of your website up to the minute if you really want. I guess they just never got enough people running the indexing software.

    2. Re:P2P? by Anonymous Coward · · Score: 2, Interesting

      how about an "open" search engine? any takers? post below....

    3. Re:P2P? by Anonymous Coward · · Score: 0
      Not for me. I'd rather see a Free Software one (as in FSF philosophy, rather than OSF's far more commercial leanings).

      If you're up for one focused more on Free Software philosophies rather than Open Source, perhaps.

    4. Re:P2P? by Anonymous Coward · · Score: 0

      This could actully work well, but I don't think so with the way you want to do it. Instead, your computer looks indexs the pages you look at and what pages you used to get there, like google's linking system but better. So if you searched for something you would see a list organised by most visted. Of course this loop would cause a feed back loop, were popular sites kept increasing in search rack without end. But to counter weight this, there could be a rating system built into the browser. One other thing is that it could see which webpages people went to after doing a search and score them higher, and even what webpages they went from there. So lets say you look for "saab truck repair help", but the best site on this is British and uses the word boot, however because a lot of people get to this site from a site that uses the word "truck" it shows up on the first search page. I think it would be hard to get a large list of sites, because of the p2p nature, so more obscure sites might fall under the radar. It would be wierd in that your certain searchs could vary from depending on what time it beause of people going off line. Well, in the end I think it seems possible to do this is in a way that created both good search results(that were hard to skew), didn't invade people privace, and hopeful didn't eat way all of too much bandwidth and cpu time.

    5. Re:P2P? by cgenman · · Score: 4, Interesting

      The closest thing to what you're talking about is Grub, which is run by Looksmart as a dead-link checker and also feeds to WiseNut. While it doesn't allow you to crawl sites that you don't have control over, it does allow you to crawl your own site.

      Personally, I've wanted a Google toolbar that indexes the sites that you surf, and adds additional positive weight to the sites that you linger on. It may not know what you liked there, but it knows that you liked it.

      Completely offtopic, but does anyone know of a screensaver on Windows that displays random (or spidered) web pages? I've been looking for an equivalent to the XWindows version for years.

    6. Re:P2P? by toddler99 · · Score: 2, Informative

      there has been work in this direction already from lehigh university check it out here http://wume.cse.lehigh.edu/

    7. Re:P2P? by cgenman · · Score: 4, Interesting

      ...Just answered my own question. Combining A+ Web Screensaver (nonfree) with a random web page URL (www.uroulette.com/visit.php) gets a random web page display on idle. Yay! Now I'll never know if I'm going to a polynesian community church or a poorly written Raiders fansite.

      Now if there were only a way to open said site and continue reading in non-screensaver mode...

    8. Re:P2P? by sik0fewl · · Score: 1

      Did you try Google? .. Altavista? Looksmart? Teoma? Inktomi? Gigablast? AskJeeves? Yahoo!? Wisenut? Alltheweb?

      --
      I remember when legal used to mean lawful, now it means some kind of loophole. - Leo Kessler
    9. Re:P2P? by whowho · · Score: 1

      look at mozdex

    10. Re:P2P? by Puchu · · Score: 1

      Personally, I've wanted a Google toolbar that indexes the sites that you surf, and adds additional positive weight to the sites that you linger on. It may not know what you liked there, but it knows that you liked it.

      Gee, my favourite porn sites popping up as most relevant when I search for Julia Fractals? I can't wait!

    11. Re:P2P? by gnu-generation-one · · Score: 1

      "how about an "open" search engine?"

      First, figure out how to deal with the fraudsters and spammers and cheats.

      Then figure out how the system will work when 80% of the people involved and 90% of the systems involved are cheating (a.k.a. google and P2P fileshares)

    12. Re:P2P? by FewClues · · Score: 2, Informative

      "Personally, I've wanted a Google toolbar that indexes the sites that you surf, and adds additional positive weight to the sites that you linger on. It may not know what you liked there, but it knows that you liked it."

      You can get that service complete with a toolbar from http://www.vivisimo.com which is a great search engine.

  11. Interesting by JoeShmoe950 · · Score: 1

    I wouldn't consider Gigablast a major contender, yet.... It looks nice, minimalist like google. The ads (Gigabits) are in a seperated section. Also, Gigablast appears to be handling the Slash blast just fine. If it can survive ./, thats a good sign. I haven't had much time to test out its search capabilities, and yes, it doesn't have that many pages indexed compared to google, but it has a chance. I could see it picking up.

    1. Re:Interesting by bcrowell · · Score: 2, Insightful
      I guess one can evaluate Gigablast based on what it can do now, or based on what foundation they're building for the future.

      Right now, one difference between Gigablast and Google is that Gigablast doesn't seem to index PDF files. This makes me sad, since I run a web site whose sole purpose is to serve up big PDF files.

      There are also some minor usability problems compared to Google. If your search returns more than 10 results, you can't tell how many there are. You have to understand how to do "+keyword" and "-keyword" -- there doesn't seem to be a form you can fill in like Google's "advanced search" form.

      It does seem to be pretty darn fast, though, and on the searches I tried, it gave reasonable results.

    2. Re:Interesting by prockcore · · Score: 4, Interesting

      If it can survive ./, thats a good sign.

      Not really. I was impressed with the power of a good slashdotting until we made the slashdot frontpage a few weeks ago (we also made it to the frontpage a few years ago but at that time we were serving static htmls).

      An article was pulled out of a mysql database, xsl transformed, sent to the webserver via SOAP and finally send about 150k of html and images to the user. Repeat 80,000 times over a 5 hour period.

      This is hardly an impressive feat. I expected more, but it turns out that slashdot really only sends about 20-30k unique visitors to your site.

      Yes, I used to be impressed with the power of a slashdotting, but now I realize that it's just the result of very crappy sites run on very crappy desktop machines pretending to be servers.

      So, no, them withstanding a slashdot link isn't a good sign, it's the very least we can expect of a commercial entity.

    3. Re:Interesting by ixplodestuff8 · · Score: 3, Informative

      "there doesn't seem to be a form you can fill in like Google's "advanced search" form."

      except of course, for the advanced search form

    4. Re:interesting by LinuxGuyFriend · · Score: 1

      I don't think it would be practical but probably interesting as an academic exercise.

      It would be too slow compared to a big cluster setup like google. I think that there are going to be enough private organizations with plenty of cash to build those setups for a very long time (unless of course one becomes a monopoly). Anyway, in the case of google, I doubt that there is much load involved per query.

      I think that what would be better for quicker refreshes would be to shift the responsability of updating the search engines away from the search engines, to the web sites or an intermediate peer. Similar in way to the P2P that you talked about. But then again, just for the simplicity, it's even better for google to just run multiple update bots in different regions of the world similtaneously with a preset acceptable refresh rate.

    5. Re:Interesting by Gherald · · Score: 2, Interesting

      Here's an example of a search that turned up a PDF link. It is very clearly labled PDF on a red background:

      http://www.gigablast.com/search?k3v=898090&s=10& q= %22preston+alexander%22+-%22victoria+ashley%22

      Pretty Nice if you ask me. I hate openning PDF links by accident. Sometimes in google I accidentally click them before I realize they are going to be opened by some stupid browser plugin or (more often than not) Adobe's bloated Reader.

    6. Re:Interesting by ashot · · Score: 1

      at what time during what day was the story posted?

      --
      -ashot
    7. Re:Interesting by prockcore · · Score: 1

      at what time during what day was the story posted?

      Friday, 1PM MST, 304 comments.

    8. Re:Interesting by lylum · · Score: 1

      >./ Ah yeah, the good old DOT SLASHING.....

    9. Re:Interesting by bcrowell · · Score: 1
      Aha, thanks for the correction. I was basing my statement on the fact that they indexed my own web pages, but didn't seem to have indexed the books in PDF format that are pointed to by my web pages. For instance, if you search on this phrase
      • "a useful analogy can be made with the role of randomness in evolution"
      in Google and Gigablast, only Google turns up the PDF-formatted book it occurs in. I thought it might just be omitting very large PDFs, but searching on
      • "When entering grades, make sure you have your Num Lock turned on"
      gives the right result, a very small PDF file, in Google, but not in Gigablast.

      I guess it's intrinsically a little difficult to make a good comparison of search engines, since their inner workings are secret. They want it to be hard to figure out what their search engines are really doing, so that it'll be harder to spam them.

    10. Re:Interesting by bcrowell · · Score: 1

      Pretty Nice if you ask me. I hate openning PDF links by accident. Sometimes in google I accidentally click them before I realize they are going to be opened by some stupid browser plugin or (more often than not) Adobe's bloated Reader.
      On the other hand, Google provides those wonderful html translations of PDF files. The conversions are amazingly good. Gigablast doesn't seem to have them.

    11. Re:Interesting by markxz · · Score: 1

      I don't know when this happened but the add URL function has been disabled (prahaps too many slashdoters submiting links to their sites)

    12. Re:interesting by Oliver+Defacszio · · Score: 1

      I'd be happy to wait two, five or ten seconds for results that are better than the wasteland of shitty 'e-commerce' that Google has become.

      --

      -
      Inventor of the term 'pardon my French'.
  12. Microsoft at the party by bigberk · · Score: 1, Funny

    Microsoft at the party would probably look something like this
    "Pass the dip, guys!"

  13. Isn't yahoo powered by google? by Toxygen · · Score: 1, Insightful

    I mean, I know they're different sites and all, but isn't the yahoo site just the google search bar with all those category links added?

    1. Re:Isn't yahoo powered by google? by levram2 · · Score: 5, Informative

      Yahoo used to use Google, but they bought Inktomi and have switched to their search engine. MSN also uses the Inktomi search engine, but tweak the results.

    2. Re:Isn't yahoo powered by google? by tvh2k · · Score: 3, Informative

      Nah, they dropped google on Feb 17th. Get with the program :-D

    3. Re:Isn't yahoo powered by google? by endx7 · · Score: 1

      Yahoo used to use Google, but they bought Inktomi and have switched to their search engine. MSN also uses the Inktomi search engine, but tweak the results.

      Right before google became a big name, yahoo used inktomi (inktomi used to be a really big name in the search engine industry). Yahoo used to use google more recently, but now they don't.

      It looks like yahoo bought inktomi about a year ago, so I guess that's what they are using again.

  14. Matt's a good guy by Thanatopsis · · Score: 2, Informative

    We use Gigablast as a back fill for one of our search engines. His stuff is very speedy and he's good guy to work with.

    1. Re:Matt's a good guy by cybermace5 · · Score: 4, Funny

      I'm glad you told everyone he's a good guy, for a minute there I just assumed he was an evil, scheming villain.

      --
      ...
    2. Re:Matt's a good guy by Anonymous Coward · · Score: 0

      Or is he...

    3. Re:Matt's a good guy by RussHart · · Score: 1

      He may be a nice guy, but has he got notes around the office reminding him not be be evil? A la Google

    4. Re:Matt's a good guy by Anonymous Coward · · Score: 0

      Really, this is moderated funny, but I swear 90% of the time that kind of "thinking" is how the Slashdot consensus is actually formed.

  15. Voting methods and search engines.... by Anonymous Coward · · Score: 5, Informative

    ...have a lot in common. Different search engines allow sites to "vote" on which ones are the most authoritative, and the best methods in one field can give insight into the best method in the other.

    For example, there is the Kemeny order (named after the same guy who came up with BASIC, John G. Kemeny). Using a version of ranked ballots and sorting websites by the mean Kemeny order gives you a method that is surprisingly good at putting authoritative sites at the top and spam sites at the bottom. For those of you who like in-depth analysis and don't mind math, the following is a good site:

    http://www10.org/cdrom/papers/577/

  16. Other Search Engines by GSPride · · Score: 2, Insightful

    I know that other people must use search engines other then google, but who? And why? I could see netscape, because it's the default homepage for many browsers, and maybe Ask Jeeve due to the easy syntax, but why would people go out of their way to Gigablast or Looksmart. Who's even heard of those two?

    --
    Apple has never claimed not to be evil, they're just very stylish about it.
  17. In my opinion..... by Kenja · · Score: 4, Funny
    In my opinion the best search engine is a Ford T-Block. Put that into a light weight steel frame and we can search them down and kill em in the street like wild animals.

    Whoa, hold on. Wrong site. Never mind.

    --

    "Have you ever thought about just turning off the TV, sitting down with your kids, and hitting them?"
  18. BOOBLE! by the+MaD+HuNGaRIaN · · Score: 4, Funny

    What about BOOBLE.

    1. Re:BOOBLE! by Anonymous Coward · · Score: 0

      Booble sux!
      The "I am feeling lucky" button didn't get me laid!

  19. Nice by Anonymous Coward · · Score: 1, Funny

    I found over one million hit for XXX and not even one hit as far as I could tell to do with the fucking vin desal pice of shit movie.

    1. Re:Nice by Anonymous Coward · · Score: 0

      If you want pr0n you're far better off searching with Xahara.

    2. Re:Nice by 16K+Ram+Pack · · Score: 1
      I've been giving some thought to search engine referencing, and how XxX was a huge mistake, because searching for it would be difficult.

      Something I read recently also suggested that the British model Jordan is going back to using her real name of Katie Price. Partly, the suggestion was that searching for "Jordan" brings back results about middle eastern countries etc.

      It's also I think why companies are setting up with these crappo names like Accenture and Consignia - to give targeted search listings.

    3. Re:Nice by sfe_software · · Score: 1

      I've been giving some thought to search engine referencing, and how XxX was a huge mistake, because searching for it would be difficult.

      I often wonder if this is sometimes done on purpose. One example is the band Live, given that searching for "Live" of course turns up millions of unrelated results... though Live was around (and so-named) quite a bit before the P2P thing exploded... ... but I do still wonder if this is something that is considered these days when naming a band/movie/whatever (it's searchability, for good and/or bad)...

      Just a random thought that popped up...

      --
      NGWave - Fast Sound Editor for Windows
    4. Re:Nice by corian · · Score: 1

      I've been giving some thought to search engine referencing, and how XxX was a huge mistake, because searching for it would be difficult.

      You think that's bad, try searching for something like .NET or similiar.

      Search engine handling of punctuation is (at the very least) very inconsistent and unpredictable.

  20. Lycos anyone by Anonymous Coward · · Score: 1, Insightful
    > there are really only five Web search companies today: Google, Yahoo (Altavista/AlltheWeb/Inktomi), Looksmart (Wisenut), AskJeeves (Teoma), and Gigablast.

    What about Lycos you insensitive clod? They're still around.

    In the UK around the year 2000, they advertised Lycos on the TV. The advert featured a bagpiper who had a kilt and no underpants and asked Lycos to find some underpants. A dog then went off at great speed, and came back with underpants in his jaws, and then, the bagpiper could safely play the bagpipes when there were sudden gusts of wind. Anyway, just for fun, I typed in 'underpants', on Lycos and the first item it came up with was a pornographic website. However, this was lycos.com, and not lycos.co.uk which is what was advertised.

    1. Re:Lycos anyone by Thanatopsis · · Score: 4, Interesting

      Lycos search no longer runs it own crawler. Matt's talking about people with their own crawler and algo.

    2. Re:Lycos anyone by modder · · Score: 1

      They ran these ads in the United States too. I tried it and was pretty unimpressed.

  21. Re:What's Next? by Anonymous Coward · · Score: 0

    Dude, it's called a magnifying glass.

  22. Competition, in this case... a good thing by Jtoxification · · Score: 4, Insightful

    We all win. With the increasing # of sites, content, web services, spam, popup attacks, and "please allow us to rape your computer" certificates to download, (that's the main reason I use Firefox when on Windows now: because you can't tell I.E. to not accept those damned installation certificates, nor block requests to change the homepage.) it becomes equally more difficult to find what you're looking for, especially when it's not something that everyone else looks for, via Google's site ranking technology. Because they fight to be the best, we get cool things like ftp searches, grep and regexp searching of dmoz.org , video, image, and music searches, even linux and bsd search-specific pages. gMail, Microsoft's entry, and now Gigablast are all rewards we get to reap from each company attempting to set its roots deeper into the Internet like weeds vying for the same piece of dirt. We are extremely lucky, but then I doubt more than a handful search engines will ever hold top ranks at one time, due to the fact that they are so specialized in what they do. Just hope Gigablast and Google don't decide to create new IM service, too.

    --
    --I gots 99 problems but a new machine ain't one!
    AMD! Asus! Whoot! 6 years!
  23. I think the guy just expanded his database by Anonymous Coward · · Score: 2, Interesting

    By placing this on /. he got:

    (("Slashdot serves 50 million pages per month"/(# users actually checking out this story))*number of searches tried) + a residual amount that might actually use this search engine more

    And what they might be interested in.

    1. Re:I think the guy just expanded his database by Ieshan · · Score: 1

      That 50 mil # is from FOUR YEARS AGO.

      Think about how much that number has changed in four years.

    2. Re:I think the guy just expanded his database by Anonymous Coward · · Score: 0

      No what he got was geeks like myself forcing the engine to do searches for "+the +a +of +it +and" to see what his worst case lookup was like. (Or I am I a freak.)
      P.S. Google did it much faster.

  24. AV by TSNV · · Score: 2, Informative

    I like AV because it's the only one (that I know of) that supports advanced embedded Boolean. Many a time Google fails to produce, and a well-built AV search will pop out what I'm looking for - albeit from a smaller selection.

    --
    If there is hope, it lies in the prowles.
    1. Re:AV by CAIMLAS · · Score: 1

      AV? What is that? ArdVark search engine?

      --
      ~/ssh slashdot.org ssh: connect to host slashdot.org port 22: too many beers
    2. Re:AV by BlueShad0w · · Score: 1

      I think it's probably Altavista

      *completely fails to spot sarcasm*

    3. Re:AV by Anonymous Coward · · Score: 0
      I remember when NorthernLight first came on the scene, they advertised themselves as the only search engine which would provide meaningful results for the search:
      • to be or not to be
  25. Lycos? HotBOT??? by Anonymous Coward · · Score: 1, Interesting

    What about hotbot? Lycos?

  26. MOD PARENT UP! by Cyno01 · · Score: 0, Redundant

    bastard made me shoot jello out my nose, fucking ow...

    --
    "Sic Semper Tyrannosaurus Rex."
  27. Only 5? by Anonymous Coward · · Score: 0

    What about Amazon/A9? We've seen enough hype about them this week to know they're in the search game too.

  28. What about patents? by enosys · · Score: 4, Interesting
    What about patents? A lot of the stuff that goes into a search engine must be patented by now. I'm sure that if you create a search engine you'll end up infringing a bunch of these patents. Yes, I'm sure that in many cases it's obvious, and there's probably prior art, but I expect that the patents are still there and it's like a minefield of patents.

    So how do you make a search engine and not get sued for infringement, or at least be able to win in the lawsuits?

    1. Re:What about patents? by Anonymous Coward · · Score: 0

      Be Microsoft.

    2. Re:What about patents? by AmVidia+HQ · · Score: 1

      Yes and no. You can't patent "a search engine", only parts of it. Only specific techniques and algorithms are patentable (or at least that's how patents are supposed to work). Google's patented PageRank for example.

      --
      VIVA1023.com | Political Fashion.
    3. Re:What about patents? by Anonymous Coward · · Score: 0

      I was wondering about this just last night. I've thought of an idea for a site, something to help people out, not going to charge and I can't see it done anywhere before. But I would like to protect that idea from being used by anyone else, just because it's going to be a lot of effort for me to do it and I'd hate for some company (like MS or Google etc) to implement it as another extra of their engines.

      If, for example, I thought-up the idea for a search engine - a site searching other sites - and no one else had - would I had been able to patent that? Considering it wouldn't have a generic meaning at the period of time? Or is it still regarded as just another website, and you can't patent that?

    4. Re:What about patents? by cookie_cutter · · Score: 1

      Just locate in the UK where software patents are illegal. I'm sure there are numerous other countries that also have such a policy.

    5. Re:What about patents? by dumbfounder · · Score: 1

      they are largely ignored because noone really knows exactly what goes on behind the scenes. And even in front of the scenes they are largely ignored. Overture has been suing google ever since google copied their auction-style keyword advertising idea. Google has patents that others are infringing upon, but so far they haven't taken any action (that I have noticed). Most patents issued in search (that I know about anyways) have been sought as defensive moves. I would say you are right to be wary though because there is a war coming and there is too much money involved for the big guys to not pull out all the stops.

  29. Thinking you were... by Cyno01 · · Score: 1

    here maybe?

    --
    "Sic Semper Tyrannosaurus Rex."
  30. ROFLMAO!!! by Anonymous Coward · · Score: 0
    Best... line... ever...

    "Mall security???? You want HK's and Starlight scopes for mall security?????"

  31. what timing for this /. article! by whowho · · Score: 5, Informative

    just as I'm pulling an all-nighter at this moment trying to embed a custom search engine into an app for use on an intranet.

    Actually what is more interesting is Nutch and Mozdex, which seems to be based around Lucene (what I am using to build my own search engine embedded into a Horde framework app). Although probably a lot simpler than the industrial grade stuff, for someone who has been used to throwing a word at an input screen and magically getting back results, the insight into the inner workings of search engines is very interesting.

    1. Re:what timing for this /. article! by whowho · · Score: 1

      argh, meant to hit the preview button... the point of the parent post being that I'd like to see more of the searching and search technology moving Open Source.

  32. Searching from the server's perspective by no+longer+myself · · Score: 5, Insightful
    Having a webserver hobby, I see the search engines crawl through my site daily. Of course in the beginning they hungrily tripped through the pages, taking in as much as could be found. Of course as time went on it seemed like some of the search engines had a new method of just grabbing a page or two every hour or so. I imagine this was to prevent over-taxing my box, but it made the first glance at my logs look artificially inflated as if people were visiting the site instead of just a crawler working its way through... slowly and painfully.

    I'd just prefer it if search engines would have enhanced rules for the robot.txt file so a webmaster could tell them more specifically how they want to be searched.

    Yes, I know you can put in a delay between page searches, and you can deny access to parts or all of the site, and you can even tell some or all crawlers to take a flying leap, but I'd like to tell them at the front door, "Search on Wednesday, make it fast, do a thorough job, and don't come back for a week."

    Too much to ask, right?

    1. Re:Searching from the server's perspective by AndroidCat · · Score: 2
      There are lists of the various bots used by search engines, and who's naughty/nice. (I've seen one list recently, just don't remember where it was.) You might want to see who the persistant ones belong to. There are also some that check for copyright/trademark violations, and their bots don't always behave.

      A few years ago, a number of Scientology-critical sites were getting hammered by bots from machoproducts.com, which seems like a weird link. (Rumours of a martial-arts cult, but no direct Hubbard connection.) I don't know if they still do that, and many sites just blocked their bad-mannered bots at the router.

      --
      One line blog. I hear that they're called Twitters now.
    2. Re:Searching from the server's perspective by Sirch · · Score: 1

      I'd like to know why robot.txt isn't protected from showing up in results from Google? Search for robot.txt on Google and you get a load of actual robot.txt files, which seems to negate its usefulness.

    3. Re:Searching from the server's perspective by MrNonchalant · · Score: 1

      Speaking as someone who wrote his own (rudimentary) spider there are other advantages (at least from my perspective) to crawling only a few of your pages at a time. It's a very simple way for a small spider to get to index a wide array of sites instead of being stuck on one huge (read: a phpBB forum or CVS repository) site. However the more specific robots.txt could also be beneficial to the spider too, as right now my spider (or any spider) can only guess what the webmaster wants and therefore stands a good chance of being evicted if it guesses wrong. About 18% of sites have disallow rules for all spiders (based on my limited crawls), not a good sign.

    4. Re:Searching from the server's perspective by Technonotice_Dom · · Score: 1

      Robotstext.org operate a web robots database which is fairly comprehensive - not sure how current it is though.

    5. Re:Searching from the server's perspective by firewrought · · Score: 1
      I'd just prefer it if search engines would have enhanced rules for the robot.txt file so a webmaster could tell them more specifically how they want to be searched.

      Brainstorm a list of all the things you would like to put into a robots.txt (or robots.xml, if we have to go that route).

      Find like-minded people who can help you refine and prioritize your list of "wants".

      Next, get a developer to help you figure out what an effective specification would look like. The spec should provide a clean syntax that will be easy for non-programming web developers to use, and it should incorporate most or all of your "wants".

      If at all possible, the programmer will get involvement from people at Google and other search engines to see how this approach will be perceived by them. Since adoption is voluntary, it is critical that this new standard realistically interoperate with their business... if everybody is going to say "only check my site on Wednesdays", then search engines that adopt the new conventions will have unused bandwidth on other days of the week; this would mean that they would have to purchase a lot more bandwidth to achieve the same level of service.

      Desiging a good spec (and communicating the design features that make it a good spec) will be challenging. Once you've done all of this, submit it to W3C if possible and let them take the ball and run with it. If the W3C won't take it, try the IETF. If they don't take it, try a few other standards organization, or whip up your own pretentious one.

      While the spec is being "blessed" by the standards organization, have your programmer create a reference implementation and open-source it under something super-friendly to rapid adoption (not the GPL in this case, but something like the clarified BSD license that is compatible with the GPL). At the same time, prepare materials (tutorials, examples, etc.) to help promote your new standard.

      IIRC, the XML SAX API is a pretty good example of how a group of people created a standard w/o going through a standards body. That approach might be effective too.

      --
      -1, Too Many Layers Of Abstraction
    6. Re:Searching from the server's perspective by no+longer+myself · · Score: 1
      Brainstorm a list of all the things you would like to put into a robots.txt (or robots.xml, if we have to go that route).

      I just did in the 3rd paragraph of my original post: "Search on Wednesday, make it fast, do a thorough job, and don't come back for a week." (Wednesdays are usually slow days for my server.) I suppose you could format it something like this: Agent=*, DaysAllowed=(Wed), Speed=Fast, Method=Complete, Recheck=10080

      Find like-minded people who can help you refine and prioritize your list of "wants".

      And that's the main reason why I posted what I did. I don't work in the tech sector, so my real-life friends wouldn't know fdisk from fsck or grep from grub.

      if everybody is going to say "only check my site on Wednesdays", then search engines that adopt the new conventions will have unused bandwidth on other days of the week; this would mean that they would have to purchase a lot more bandwidth to achieve the same level of service.

      Oh... Oh... Oh... Sorry, you just touched a nerve... Don't worry, I won't bite. ;-)

      I don't *CARE* if those search engines search my site or don't. They aren't under any obligation to search me (as I don't pay them) and they aren't necessarily entitled to search me (as they don't pay me). If I want to restrict the time or the day to which I want to be "searched" and it's not convenient for them, then I guess I just won't be searched. Oh sure... the world loses out on not being able to find the text of that joke I posted... Boo hoo. There's lot's of other fish in the sea.

      But if they've got the time and bandwidth, that's when I say it's OK. Otherwise, I'll use what tools I can and disallow the crawler entirely. I don't need 5, 10, 50 different spiders crawling my server at any given time. I don't host websites for those machines. I host my sites for the enjoyment of people.

      I imagine that most people who run webservers do it "for-profit" They practically have to BEG the search engines to come to their sites and give them high rankings. They pay big money for people to sit down and analyze this stuff to get the gods of Google to favor them at any given time.

      I'm just taking a guess, but maybe this is why crawlers tend to favor visiting me... I don't have the time or resources to dream up schemes to get my site at the top of a search, my links are clearly posted and easily read because most people get lost trying to find their cursor. Maybe that's what search engines are really looking for.

      And for the curious: If I scrape off all the search bots, code red worms, and the 404's from external references, I get *MAYBE* about 200 hits per day. Not unique visitors. Not page hits. We're talking total site hits on a good day.

      Webserving for me is a hobby. I know the limits of what I can do, and I'm not ashamed of them. If I was doing what I'm doing now for a living, I'd have died of starvation a long time ago. But it's fun, you know? Some people think it's nice, others scoff or ignore, but still- some think it's nice, and I don't necessarily have to care about what the other people think.

      In closing, if a programmer in this field is reading this, the argument I present should be fairly obvious, and the potential benefit should be obvious too. I'm not a programmer, but with what little I have dabbled in it, I'm guessing that implementing the suggestion would be of intermediate difficulty to a professional in this field. I'm not saying it's easy, but if it can be quickly and clearly stated, it's not impossible. either... Though they may argue, "It's too much to ask."

      If that's their stance, then I've got my own reply:

      robots.txt

      User-agent: *
      Disallow: /

  33. Re:A question for US slashdotters... by Anonymous Coward · · Score: 0

    "Read your history, you dolt..."

    The UN says this, but you probably imagine the UN is some sort of conspiracy.

    Idiot.

  34. Uhm No by Tedium+Unleased · · Score: 5, Insightful

    Microsoft is also coming to the party, and everyone's a little bit nervous to see what it's bringing.

    Oh yeah real nervous. They're getting on the bandwagon late; too late to monopolize this particular free (as in shut the fuck up) service. If by some miracle they produce something 'threatening', it will be because it's good or because the others have slacked off.

    1. Re:Uhm No by Anonymous Coward · · Score: 3, Insightful

      yeah! they only do well if they are first, you know, like with excel, and internet explorer, and a graphical user interface.

    2. Re:Uhm No by Keith+McClary · · Score: 2, Insightful

      If by some miracle they produce something 'threatening', it will be because it's good or because the others have slacked off.


      Usually they try to buy a competing company or hire the brains behind it.

    3. Re:Uhm No by HeghmoH · · Score: 1

      Yeah, they're getting on the search bandwagon late the same way they got on the PC bandwagon late, the office suite bandwagon, the browser bandwagon, the input devices bandwagon, the server OS bandwagon, and the gaming system bandwagon. Obviously they have to hope.

      --
      Mod down posts with a "Free Mac Mini/iPod" sig, they're spam!
    4. Re:Uhm No by Jugalator · · Score: 1

      it will be because it's good or because the others have slacked off.

      Or because they dominate the consumer OS and browser market.

      --
      Beware: In C++, your friends can see your privates!
    5. Re:Uhm No by jbrw · · Score: 1

      Erm, Microsoft got on to the PC bandwagon very very early. Like, they were writing commercial PC software literally before anyone else.

      That sounds pro-MS. Eeep! :)

    6. Re:Uhm No by HeghmoH · · Score: 1

      Since we're talking about the early 80s here, I meant any Personal Computer, not just International Business Machines (and compatible) Personal Computers.

      Although, thinking about it more, I'm not sure if they were late, per se, but they certainly weren't there before anybody else.

      --
      Mod down posts with a "Free Mac Mini/iPod" sig, they're spam!
    7. Re:Uhm No by flyneye · · Score: 0

      Microsoft couldnt find their ass in a lit room with both hands, a search engine and a map.

      --
      *Repent!Quit Your Job!Slack Off!The World Ends Tomorrow and You May Die!
    8. Re:Uhm No by Anonymous Coward · · Score: 0

      Yes it's not like Google came in relatively late or anything. I remember when Lycos and Altavista were the good ones.

    9. Re:Uhm No by Tedium+Unleased · · Score: 1

      You're right... though people weren't as entrenched as they are now. It will be difficult for Microsoft to force anyone to use their search engine - especially with all the scrutiniy they'll get based on prior dealings with IE and WMP.. It's possible they make something good - but I'm not losing sleep over it. They started challenging the word processor and spreadsheet market in a time when people were still using typewriters and many offices had more filing cabinets than computers. There are a lot of things Microsoft didn't even try to take over, and I wonder why - like Adobe-PDF style stuff. Was RTF their attempt? What else seems like it was within their grasp at one time or another...? - tax software.. they have quicken like stuff, but they haven't dominated that arena like Office has with word processing and spreadsheets. 3d rendering stuff like Maya? probably a little beyond their scope and business plan, though a MS competitor to Photoshop wouldn't have surprised me, still wouldn't actually - Photoshop is good, but it could do with some dumbing down and they could probably do some intergration with Office that people might use (other people, not me =)).

  35. Gigabooo by vinit79 · · Score: 2, Funny

    Gigablast sucks : Proof - I entered my name and Gigablast says "no results". Did u mean "something thats not my name". No thanx I did not

    Google : My site is the first !!!

    And of course I refuse to believr that anyone in the world would be interested in anything but my home page.

    1. Re:Gigabooo by Rudy+Rodarte · · Score: 1

      Me too. Before my page comes up, the page listing me as one of Bethanie's fans comes up. What's up with that? Oh, plus it suggests I search for something other than my name. But, with Google, all is well again.

  36. Open Source Search Engine? by Anonymous Coward · · Score: 0

    are there any open source search engines out there that have wide spread use?

    and if not, why hasnt one been tried yet? (not to be cynical, but) i mean, theres open souce everything else, so why not a search engine?

    1. Re:Open Source Search Engine? by idiotfromia · · Score: 5, Informative

      I don't believe it's actually being used in practice, but Nutch is developing rapidly. The largest test crawl they've completed has been about a hundred million pages. They're asking for donations to develop a larger demo system.

  37. Re:SCORE -1 ORACLE SYNTAX by Anonymous Coward · · Score: 0

    actually that's standad SQL syntax.
    it's the syntax you would use for PostgreSQL, MySQL and oracle.
    surprise, surprise: seems like SQL server is the odd one out.

  38. Re:A question for US slashdotters... by Anonymous Coward · · Score: 0, Offtopic

    The Iraqi invasion has happenned, and there's not a lot anyone can do about it. A sudden withdrawl by Coalition troops would be a bad thing.

    I don't know about your Clinton points - Democrats have tended to be pro-Jewish in the past, but the recent endorsement from Bush looks like a blatant attempt to win the less-intelligent Jewish vote to me.

    Remember, the only way to resolve this sort of problem is through democracy. Look at Schleswig-Holstein for a situation where the victors (the Danes after WW2) took only the land that wanted to be theirs after a pleblisite. The Isareli problem can be resolved in a simialr civilized way.

  39. First Pr0n Site? by Anonymous Coward · · Score: 0

    From the article:"There I designed and implemented the Artists' Den for Dr. Jean-Louis Lassez, the department chairman of computer science. It was an open Internet database of art. Anyone could add pictures and descriptions of their artwork into the searchable database. It was a big success..."

  40. HTML by Shinglor · · Score: 0, Offtopic

    That is the worst HTML I have ever seen, there's not even a starting or tag. I've looked through the source and I haven't got a clue how the browser is getting a title.

    1. Re:HTML by Anonymous Coward · · Score: 0
      <html><title>Gigablast</title>
      Yeah me neither.
  41. the future and the search by Maxim+Kovalenko · · Score: 1

    The idealistic part of me hopes that what this newfound competition will bring are more accurate searches. The cynical side of my being believes that it will be a no-holds barred advertising onslaught that will cause us to see a resurgence in the "Search Engine Optimization" business that has helped clogged search engines in the first place. At any rate, interesting times are ahead of us...and now that competition has heated up it will still be up to the user to seperate the wheat from the chaff. And so it goes.

  42. Only thing other than google by modder · · Score: 1

    I've found CometWay to be quite useful. (Or does this fall under some sort of meta search?)

    I really only use google, but I've been able to find things here that I haven't on google, because of their categories. (Like wacky shell invocations people use as their sigs and what the hell they do.)

  43. Re:A question for US slashdotters... by Anonymous Coward · · Score: 0

    You give us a bad name. Shame.

  44. The value of pagerank by jfengel · · Score: 4, Interesting

    The most interesting assertion in the article was that Pagerank was useless. He says Google's real win is its ability to cache a copy of the page and show you a summary including your search terms. I do use that a lot to quickly exclude irrelevant pages.

    He said that his internal tests at Infoseek showed that pagerank didn't substantially improve the value of searches over simpler link analysis algorithms. I find that interesting, because I've worked with that algorithm and I know it's a stone bitch to compute.

    He might well be right. I like Google over the other search engines because the interface is simple and clean, and I find it pleasant to use. I'm reminded of Donald Norman's book on Emotional Design, about how we can get really attached to things that work for us.

    Google sells itself on pagerank, but at the very least it's insufficient against "search engine spam". If pagerank is less important than speed and utility, maybe I'll have something else programmed in to my Firefox seach bar. But not today.

    1. Re:The value of pagerank by paganizer · · Score: 1

      I'm willing to see what happens. and use it, on the way.
      The fact that he was a developer at infoseek (remember infoseek? the best pre-google search engine until it was embraced and ritually slaughtered by disney, much like cnet did to winfiles.com? but I degress) shows me that it has potential.

      --
      Why, yes, I AM a Pagan Libertarian.
  45. Given his resources, by halothane · · Score: 1

    I am very impressed.

    Gigablast is a search engine that I've been working on for about the last three years. I wrote it entirely from scratch in C++. The only external tool or library I use is the zlib compression library. It runs on eight desktop machines, each with four 160-GB IDE hard drives, two gigs of RAM, and one 2.6-GHz Intel processor. It can hold up to 320 million Web pages (on 5 TB), handle about 40 queries per second and spider about eight million pages per day. Currently it serves half a million queries per day to various clients, including some meta search engines and some pay-per-click engines. This, in addition to licensing my technology to other companies, provides me with a small income.
    1. Re:Given his resources, by mikis · · Score: 1

      I'm both impressed and shocked. What if one of the machines or hard drives die? It doesn't look like he has any redundancy.

    2. Re:Given his resources, by Anonymous Coward · · Score: 0
      It doesn't look like he has any redundancy

      rtfa

      *SK* Data integrity is a big factor in large databases. How does Gigablast deal with the loss of a machine and with data corruption?

      *MW* Gigablast employs a dual redundancy scheme. So, every machine in the network has a twin host. You can configure it to have as many twins as you want. The machines in the network continually ping each other, and if they detect that one machine is not replying promptly to its pings, then they label it dead. At that point nobody will request any information from the dead host until it begins replying to his pings again. It's like poking someone to make sure they are awake before you ask them a question.

      Gigablast detects corrupted data and patches the data automatically from its twin host. If the twin's data is corrupt as well, then the data is excised and discarded.

    3. Re:Given his resources, by Tony+Hoyle · · Score: 1

      No RAID, no failover, relatively slow processors.
      I notice he doesn't mention the speed of the internet connection (40 queries a second isn't a lot - I expect google handles 10 times that at least).

  46. OPEN search engine? by Anonymous Coward · · Score: 0

    how about an open search engine? is it possible??? we could use p2p to index the web... would sure beat google...... it seems to be a good idea....

    post below.

  47. Obligatory..... by CastrTroy · · Score: 1
    1. Write program that crawls the web.
    2. Store text of web pages in Access Database
    3. Make web interface that allows text of pages to be searched in linear fashion.
    4. Host on a Pentium 2, On Personal Web Sever, on windows 98.
    5. ......
    6. Profit
    --

    Anthropic principle: We see the universe the way it is because if it were different we would not be here to see it.
    1. Re:Obligatory..... by Anonymous Coward · · Score: 0

      7. Get HAX0REDZ!!!!! WIF GREEETZ 2 ZER0COOL & R00TM4ST4

  48. How to build a search engine? by bakawally · · Score: 3, Funny

    I dunno. I better google it.

  49. What will Microsoft bring today? by Merovign · · Score: 1

    "and everyone's a little bit nervous to see what it's bringing."

    Embrace, extend, ??, profit.

    ??=buy & close

  50. originality by next1 · · Score: 1

    why try and make the results pages look exactly like google's with the green URLs printed under site listing and general formatting of results?

    they should make themselves stand out as something new and different, rather then try to imitate google.

    otherwise it does look like a good search and the "gigabits" (the related searches that appear at top of results) are an interesting idea.

    1. Re:originality by Tony+Hoyle · · Score: 1

      Not *exactly* like google... he's using a shitty courier font throughout the site.

      Looks ok in IE, but in other browsers looks like crap.

  51. Just wait until the duplicate... by Ayanami+Rei · · Score: 1

    is posted on a weekday next week just before lunch hour eastern time.

    --
    THIS THING CAN TURN ON A DIME, MACROSSZERO STYLE ALSO FUCK BETA, ~NYORON
  52. When *I* worked with Matt Wells... by Anonymous Coward · · Score: 0
    ...he ran a pr0n site called "horny porny". (The domain belongs to someone else now--I wouldn't recommend visiting it.)

    It scoured newsgroups for pr0n and presented it in an organized way. What's interesting is that Matt omitted "hornyporny.com" from his bio site. I wonder why?

    Anyway, here's what Matt looked like circa 1998, when he used to be an infoseeker.

  53. Search engines could replace a query language? by JusTyler · · Score: 4, Interesting

    Fave quote from that article..

    However, I think that search engines, if they index XML properly, will have a good shot at replacing SQL.

    Discuss.

    1. Re:Search engines could replace a query language? by Tony+Hoyle · · Score: 1

      I think that apples, if they index XML properly, will have a good shot at replacing oranges.

    2. Re:Search engines could replace a query language? by 1iar_parad0x · · Score: 1

      IF they index XML properly...

      SQL is based on an algebra. Search engines are based on probability or fuzzy logic. I don't think search engines will completely REPLACE SQL.

      We use probability to describe phenomena when we really don't understand it. We use algebra, calculus, or traditional logic once we have a firmer grasp. It's boils down to descriptive versus predictive science.

      --
      What do you mean my sig is repetitive? What do you mean my sig is repetitive? What do you mean....
    3. Re:Search engines could replace a query language? by IceAgeComing · · Score: 1

      I'm a little late here, but had to at least attempt to inject some actual information content into this thread. Read on to see if I succeed.

      The original quote is nonsensical under normal scrutiny, but we can give the quoter some slack and try "squinting" at his quote to offer meaning.

      Meaning 1:

      * The quoter thinks that web information, stored in XML datastructures, as opposed to an Oracle DB, is a really great idea.

      No! Actually, this is a really BAD idea. The idea behind databases is that one can access large volumes of information from really large volumes of data quickly. Think metaphorically of algorithms good at avoiding unnecessary disk reads in getting large records. That's what DB's do. To scan in all information before accessing what you want is sometimes....a real drag....very slow...wouldn't want to download all books from a library just to pick one, would you? No.

      Meaning 2:

      * The quoter thinks SQL is a database system. No, actually it's a way to ask a question. It's a language that entirely independent of the database implementation underneath. So the quote shows some really disturbing lack of understanding. I hope whoever said it is not masquerading as someone smart; they could get someone hurt.

  54. I'm more impressed by his view of computer science by Anonymous Coward · · Score: 0
    I think he really hit the nail on the head about computer science in general (at least this is why I'm so passionate about it):
    All of this is the main reason I'm working with search now. I see the close parallel between the search engine and the human mind. Working on search gives us insights into how we function on the intellectual level. The ultimate goal of computer science is to create a machine that thinks like we do, and that machine will have a search engine at its core, just like our brain.
    Bingo. This is often what comes to mind when I'm trying to explain to someone why in the world I would code all day.
  55. Heh... by Xenographic · · Score: 4, Interesting

    I've often wondered why Google doesn't put up an "unsafe" image search option? (e.g. leave out all the images it deems "safe").

    Then again, it hardly needs to most of the time...

    1. Re:Heh... by saqq · · Score: 1

      they do... go to preferences-> safesearch filtering

      --

      small flowers crack concrete
  56. Microsoft Party Crashing by Nom+du+Keyboard · · Score: 3, Insightful
    Microsoft is also coming to the party, and everyone's a little bit nervous to see what it's bringing.

    Everybody knows what Microsoft is bringing. Well almost everybody. Okay, I'll spell it out:

    1: Bring lots of money.
    2: Buy out a competitor.
    3: Rename it Microsoft Search.
    4: Attempt to trademark the word "Search".
    5: Bind it tightly into Windows as an essential service.
    6: Don't get it right until version 3.0.
    7: Profit!

    --
    "It's the height of ridiculousness to say for those 9 lines you get hundreds of millions."
    1. Re:Microsoft Party Crashing by fatphil · · Score: 1

      4. Attempt to trademark the word "Search".

      What do you mean 'attempt'?
      On the assumption of:

      1: Bring lots of money.

      Then they'll have "search", "search engine" and a hundred other terms trademarked by last week.

      And shouldn't 6 read "until version at least 2004".

      FP.

      --
      Also FatPhil on SoylentNews, id 863
    2. Re:Microsoft Party Crashing by vinlud · · Score: 1

      3.b Adds functionality to MSIE so visited domains are automatically indexed by MS Search hereby becoming the largest index of the internet.

      --
      Repeat after me: We are all individuals
  57. less commercialism by dj245 · · Score: 4, Interesting
    I did a quick search on Gigablast for "Radio control speed controler". Now normally, on google, you would get a couple million pages of websites wanting to sell you a speed controller. On gigablast, however, The first 10 results were pretty much information about speed controllers, and/or battlebot sites that explained what you would need them for.

    I've noticed lately that Google seems to be filling up with websites wanting to sell you stuff (even if they don't use spamming techniques). Perhaps these little guys can put the pressure on Google to get some better algorithms. Or perhaps its time for Google to fade into the past like Altavista did a couple years ago and make way for the new.

    --
    Even those who arrange and design shrubberies are under considerable economic stress at this period in history.
    1. Re:less commercialism by Anonymous Coward · · Score: 0
      I want news... I want stuff. That matters!!

      I want search results. I want information that matters! Not ads!

    2. Re:less commercialism by Neo's+Nemesis · · Score: 0
      I did a quick search on Gigablast for "Radio control speed controler". Now normally, on google, you would get a couple million pages of websites wanting to sell you a speed controller. On gigablast, however, The first 10 results were pretty much information about speed controllers, and/or battlebot sites that explained what you would need them for.

      This is what too much indexing will bring you sometimes. Google is based on making internet like a marketplace, you don't only see the pizza parlours, but also ads of places claiming to have most delicious pizzas, jobs for pizzaboys, food items that are pizza derivatives, and others.

      If you donate some cycles to gigablast, and let them build on their index, then you sure would get lots of links, many of which will include some info that really don't matter to you.
    3. Re:less commercialism by a.ameri · · Score: 3, Interesting

      I actually was looking for some daily ISO snapshots of debian sid reopsitory. Nevr heard of Gigablast before, so give it a shot and search for 'daily sid snapshot iso'. Gigablast found no results, Google found 785, and looking at the first 10 results, I was easily able to find what I was looking for.

      C'mon, yes Google's interface is cool and stuff, Google's success isn't just it's interface. Their search algorithms are rock solid, their are continually improving them, and Google resturns the most relevant results, of any search engine.

      I still think that Google's biggest advantage over others, are their search algorithms, and their method of indexing webpages. Everyone can copy the interface. But not everyone can build what Google has built: rock solid searching algorithms, a clustered scalable filesystem GFS, their own webserver GWS (albeit a modified Apache) and they are reportedly making their own OS, in which anyone can have an account on! Add to these, numerus useful facilities like thier Linux/BSD/MS/Mac search, their newsgroup search, their news section, etc, and you see why Google is succesful, and well, others aren't that much.

      It's not 1996 anymore, when all you had to do was write a couple of perl scripts and install NCSA on a *Nix with a medicore DBMS, and viola, you had a search engine. These days, barriers to entry in the search engine field are very high. Google reportedly uses 100k servers. These days, it's a business that needs lots of capital, and knowledge, technical know-how, and labour, to start with.

      --
      -- /* Those who don't underestand Unix, are condemned to reinvent it poorly */
    4. Re:less commercialism by Anonymous Coward · · Score: 0

      It's not 1996 anymore, when all you had to do was write a couple of perl scripts and install NCSA on a *Nix with a medicore DBMS, and viola, you had a search engine.

      I assume "medicore" is some kind of core health product? And "viola" -- what does a musical instrument have to do with search engines? Man, normally I wouldn't be a spelling Nazi, but those two stupid mistakes are just too much to bear!

    5. Re:less commercialism by mabu · · Score: 1

      I've noticed lately that Google seems to be filling up with websites wanting to sell you stuff

      This is the inevitable entropy that all search engines are subject to. In the beginning they have to be content-heavy because they can't make money; once they get the market share, they become more commercial. This shouldn't be a surprise.

      Google isn't perfect, but in comparison to every other search engine out there, I'm of the opinion that Google is one service that hasn't sold out to the degree of its counterparts. I cannot imagine any competition not getting worse if they had Google's market share.

    6. Re:less commercialism by Anonymous Coward · · Score: 0

      Oooh boy. I'm looking forward to the $300 GoogleOS which will be the only OS capable of correctly using the Google search engine which is the only WWW search engine left after all the others have been run out of the market using anticompetitive behavior.

  58. Re:Hmmm....Talk About Stealing... by Nom+du+Keyboard · · Score: 1
    goes as far as to link to OTHER search engines to help out your search

    --
    "It's the height of ridiculousness to say for those 9 lines you get hundreds of millions."
  59. Re:Hmmm....Talk About Stealing...Completed Post by Nom+du+Keyboard · · Score: 1
    goes as far as to link to OTHER search engines to help out your search

    How long before other search engines start considering this stealing? I mean, I could have a search engine running tomorrow, if all it did was link to Google and return hits to my own bannered page.

    --
    "It's the height of ridiculousness to say for those 9 lines you get hundreds of millions."
  60. No.... by Anonymous Coward · · Score: 0

    Am I the only one who's never heard of Gigablast...
    Nope, I'd never heard of them either. We BOTH have now.

    but then not too many years ago, I remember a time when I've never heard of Google.
    Yeah, until they submitted that "infomercial" story to Slashdot and snuck their name in their with the big names of the time, as though they were one of just a few search engines. Oh wait....they didn't do that. Google succeeded by being a good search engine.

    I'm surprised the title of this story wasn't "make money fast."

  61. Perl port of Lucene recently available by persaud · · Score: 1
    Note that Lucene has been ported to Perl by Simon Cozens and Marc Kerr: Port funded by Kasei, released Feb 2004.
    1. Re:Perl port of Lucene recently available by whowho · · Score: 1

      very interesting, thanks!

  62. What I'd Like To See In A Search Engine by Nom+du+Keyboard · · Score: 5, Insightful
    What I'd like to see in a search engine is a page kill or broken link feature to keep it current. If I click a link that is broken or vastly changed (e.g. the link to ancient Chinese pottery is now a porn site), that I could backup to the search results page and click a link to have them immediately re-crawl that page. I think it would make for better results, and am surprised that it's not already common.

    You my license my patent on this idea for reasonable terms in exchange for shares of your company's stock.

    --
    "It's the height of ridiculousness to say for those 9 lines you get hundreds of millions."
    1. Re:What I'd Like To See In A Search Engine by igrp · · Score: 2, Insightful
      Well, I think the potential for abuse would just be too great.

      Respidering a website doesn't just take up bandwidth but also a lot of CPU cycles. That's especially true if you're running extensive algorithm-based computations (like Google does) and not just doing a quick-and-dirty instant-add to a database. It would also allow webmasters to cheat the system: temporarily mirror some relevant, high-traffic site, have it reindexed, change the contents (porn, spam, you name it). After a while, the bot will reindex your site to see if your site's content has changed. At that point, all a rouge webmaster would have to do is, to repeat the process.

    2. Re:What I'd Like To See In A Search Engine by harmonica · · Score: 2, Informative

      If I click a link that is broken or vastly changed (e.g. the link to ancient Chinese pottery is now a porn site), that I could backup to the search results page and click a link to have them immediately re-crawl that page.

      The index is usually updated only once every couple of weeks. Recomputing PageRank (or whatever everybody else uses) takes its time. That's why more or less immediate updates are reserved only to the best-known sites.

      You can report 'false' results with the Dissatisfied? link at the bottom of the Google result page.

    3. Re:What I'd Like To See In A Search Engine by Nom+du+Keyboard · · Score: 1
      You can report 'false' results with the Dissatisfied? link at the bottom of the Google result page.

      While I've never used the link you mention, they could certainly make this much easier with my suggestion. Even if they put a 'dissatisfied' link next to each entry and pre-filled in the reply form with the site information, leaving you to either check one of several fixed reasons, or leave a comment. This would automate much of their processing.

      I truly doubt that servicing user complaints about mislabeled sites would add much to their overall spydering load.

      --
      "It's the height of ridiculousness to say for those 9 lines you get hundreds of millions."
  63. Re:A question for US slashdotters... by im+a+fucking+coward · · Score: 0, Offtopic

    2) Okay, the jokes over. Bring back Clinton. He never would have [pulled UN support troops and watched while Hutu's hacked 800,000 Tutsi's to death with machetes, then dalied over the semantics of 'genocide'.] Whoops! I guess we should bring Clinton back after we arm the Sunni's and Shiite's with machetes? Or maybe if we need to carpet bomb Iraq to draw attention away from his latest blowjob?

    All joking aside, your wish for the US to get the hell out of the Mid East is smart, but it's far too late. Talk to your terrorist pals and let them know that America will be invading their favorite host countries for generations to come because of one fucking idiot: Osama Bin Laden. You can also thank OBL for giving Sharon a blank check signed by the good ol' US of A. Go ahead and smear a wad of credit on Yasser and Hamas concurrently.

    If you really want the US the hell out of that mess, just convince radical Islamic elements to stop blowing up innocent civilians, and to carry out their disagreements via discussion, argumentation and or treaty. Take some goddamned responsibility for stupid acts of terrorism, bring it to a halt, and and US public opinion will force an immediate withdrawl. Otherwise, it doesn't matter which party is in power, we're there to stay.

  64. One guy, eight computers by crisco · · Score: 1
    Did you read the article? Gigablast is one guy with eight computers. He thinks he can approach the size of Google's index (5 billion pages) this year if he invests all of his earnings into hardware and bandwidth. He's also well aware of the search engine spam problem and has built anti abuse features into it.

    Given that, plus the fact that he's spidered my worthless blog, I'm pretty impressed. Definately something to watch.

    --

    Bleh!

  65. haha by Anonymous Coward · · Score: 0

    you all think ur smarter than ur

  66. for a top search engine by Neo's+Nemesis · · Score: 0

    you need an awesome combination of 2 things: ALGORITHM and STORAGE
    Some ppl talk that a "killer" algo would get you there, but seriously, a search engine is supposed to give you relevant results in everything, and if you search for "Senior employees working at Sugarcane plant", you won't get that with just good scripting. Where the hell do u store it then
    But algos are also imp as thats whats keeping us visitin Google from MSN

  67. The Key to Winning the Web by xeon4life · · Score: 1

    In all seriousness: Make the address and interface simple. For instance Google is very easy to type. It has become as easy to type as my name. You can keep your fingers on two keyboard buttons for 4/6th of the way through the name and move the two fingers to 'l' and 'e'. It doesn't take long to load up, either, which is the second best thing one can do. Vivisimo is a crazy address, and will never become popular. You first heard it from me.

    -Xeon

    --
    Real programmers can write assembly code in any language. -- Larry Wall
    1. Re:The Key to Winning the Web by bhima · · Score: 1
      But in FireFox, the difference between the BBC news service and the Google news service is "news.b" compared to to "news.g" (plus the tab & return)

      and if I want "www.groklaw.net" rather than "www.google.com" it's "gr" rather than "go".

      So I doubt I'd every really type "www.vivisimo.com" in its entirety but only "v" or "vi".

      --
      Nothing in the world is more dangerous than sincere ignorance and conscientious stupidity.
  68. If MicroSoft Comes to the Table . . . by Newt-dog · · Score: 0

    IF MicroSoft comes to the table with their own search engine, you can be assured that there will be dozens of bugs that we can exploit to make our sites rank in the top 10, or that coveted "first page" of search.

  69. give Gigablast some respect... by Anonymous Coward · · Score: 0

    Most websites that review search engines have a high respect for Gigablast (ie. webmasterworld.com) For a small operation he does an excellent job and has very good product with potential...

  70. Even SearchKing by PlatinumInitiate · · Score: 1

    Even SearchKing is better known than Gigablast... and SearchKing pretty much faded into obscurity after the Google/SearchKing problems a while back.

  71. In other news... by sydbarrett74 · · Score: 4, Funny

    'In other news, Google announced the buy-out of Gigablast. The newly-formed company will be called Giggle.'

    --
    'He who has to break a thing to find out what it is, has left the path of wisdom.' -- Gandalf to Saruman
    1. Re:In other news... by Anonymous Coward · · Score: 0

      Ah, but you forgot the announcement that Gigablast and Booble merged. After all three companies are through, the new corporate name is GiggleBust.

  72. Re:Hmmm....Talk About Stealing...Completed Post by vegaspctech · · Score: 1

    I could have a search engine running tomorrow, if all it did was link to Google and return hits to my own bannered page.

    And if you did you would have a web site that didn't really provide its own content but rather generated it by retrieving data from elsewhere on the Internet. Is this not what google does? But they add value to it, just as you would have to do in order to see any significant traffic.

    The Internet is us and we're a bunch that lives just about everywhere and does just about everything. If google, yahoo, Microsoft, SCO, IBM and a hundred like them all disappeared from the face of the planet tomorrow, the Internet would be just fine. What they bring to our table is of insignificant value compared to what we bring. And what value they add has value only because we've invited them to our table. These days too many companies benefiting from access to our Internet are like guests who, having brought passable wine, wish to claim credit for the success of the feast.

    If you'd like to set up a site that pulls its content from google, go for it. Just be sure to allow them to take from you what you would take from them. That's how our feast works. We all bring a little something to it and share.

    --

    Making the world a better place, one psychotic episode at a time.

  73. Re:Hmmm....Talk About Stealing...Completed Post by Nom+du+Keyboard · · Score: 1
    These days too many companies benefiting from access to our Internet are like guests who, having brought passable wine, wish to claim credit for the success of the feast.

    That's a poetic way to put it.

    My way of looking at it is that for the first time, everyman has a microphone and a soapbox from which to speak to anyone in the entire world who wishes to listen. While I realize that absolutely has to upset many people entrenched in power, I feel it is the finest example of free, equal, and unfettered speech that has ever existed -- and is very much worth defending greatly.

    --
    "It's the height of ridiculousness to say for those 9 lines you get hundreds of millions."
  74. Re:SCORE -1 ORACLE SYNTAX by Osty · · Score: 2

    surprise, surprise: seems like SQL server is the odd one out.

    Nope, SQL Server handles that syntax just fine. However, unlike C, the ; is unnecessary unless you're stringing multiple commands together on the same line. This is not SQL Server syntax, but ANSI SQL syntax. Most (all?) SQL developers don't bother with semicolons unless they're doing multiple commands on a single line. And since any good DB developer is not writing dynamic SQL (ie, "SELECT * from foo" from PHP, ASP, Perl, etc), but calling stored procedures through proper mechanisms (ie, not creating a dynamic query of "EXEC sp_foo param1, param2"), they typically don't bother.

  75. Personal Search Engines. by torpor · · Score: 1

    You know what I want?

    My own search engine, running on my POSIX-capable machine, indexing and organizing 'bookmarks', though I suppose at that point they won't be called bookmarks, and thank god for that.

    RIP, bookmarks!

    Anyway, my own search engine need not be for anyone but me, and it can search and index and process whatever web content I feed it, for later 'search' and organization.

    That would be -far- useful to me than another 'gotta be on the net' web-service ...

    --
    ; -- the corruption of government starts with its secrets. a truly free people keep no secrets. --
    1. Re:Personal Search Engines. by pauljlucas · · Score: 1
      My own search engine, running on my POSIX-capable machine, indexing and organizing 'bookmarks' ...
      You could probably use SWISH++ and a modified vesion of the included httpindex script to do exactly what you want.
      --
      If you reply, do so only to what I explicitly wrote. If I didn't write it, don't assume or infer it.
  76. Only five? by adriantam · · Score: 5, Interesting

    Search is a fiercely competitive arena, even though there are really only five Web search companies today: Google, Yahoo (Altavista/AlltheWeb/Inktomi), Looksmart (Wisenut), AskJeeves (Teoma), and Gigablast.

    I am a Chinese speaker and the tradition of east asian writing is character-based and no alphabets. That means we don't separate words with blank spaces but rather dosomethinglikethis. The language we use is having this characteristic and caused many problem for search services because you never know you interpret that thing into dos ome thingli keth is is right or not. We have to introduce some dictionary into the search engine and it is different from many western languages. So I don't believe there is only five search engine providers in this world. At least I know a list of more search engines developed to support east asian languages.

    --
    http://www.ieaa.org/~adrian/
    1. Re:Only five? by pe1chl · · Score: 3, Funny

      When an American writes "there are only five companies that..." he really means: "there are only five companies IN THE USA that...".

    2. Re:Only five? by dajak · · Score: 1

      There are hundreds of search engines that only index pages in a single language, and I also know some specialized search engines for finding historical documents that use language-specific heuristic rules (for consonant shifts) for alternative or old spelling rules. English is particularly easy to search on because nouns are compounded with spaces, but that's not the case in a number of other European languages. So even if generic search engines like Google work for other languages, it is always possible to improve recall by taking language-specific heuristics into account.

      The fiercely competitive American search engine market will of course quickly converge because fierce monopolist Micro$oft has entered, but the rest of the world is still safe because most departments of Micro$oft (except the legal department basically) have apparently not yet discovered that it exists.

  77. Gigablast... 2 years old and nobody's heard of it! by mbauser2 · · Score: 4, Interesting

    I have heard of Gigablast, but I've never been impressed by it. (I wrote a review back in 2002.) Most search engine optimizers love Gigablast, however, because it's such an easy engine to game.

    It's a fairly old-school engine: indexes whatever it can and favors pages that are keyword-heavy. It's almost too easy to spam. I don't think there's anything PageRank-like in the algorithm, otherwise, it wouldn't be able to add pages to the index "instantly". (PageRank is too computationally intensive for that.) Gigablast still thinks meta-tags are a great idea! While the hardware setup might be innovative (I'll leave that to the hardware experts to decide), the engine software itself seems about ten years behind the times.

    Like many posters here, I doubt a one-man outfit is going to take down Google (although many search engine optimizers would like it to). Gigablast has had two years to make an impression, and it hasn't. A company on an acquisistion binge might be crazy enough to buy it, but I wouldn't hold my breath.

    --
    Proud to be / Smiley-free / Since Nineteen / Ninety-Three
  78. I'm ready to change by Andy_R · · Score: 4, Interesting

    Wonderful as Google is, I'm finding more and more searches don't produce useful results.

    I keep getting high rankings from sites like bizrate and kelkoo, which don't have any content whatsoever, but have convinced google to show pages that say "search for best prices on xxxx" where xxxx is my search term. Often the problem is so bad that I don't see any sites with content until page 2 of google.

    Another issue is with searches for song lyrics. There are dozens of identikit advert sites which drown a tiny (and often inaccurate) text payload is a swarm of adverts. Finding a site written by someone who cares about accuracy is getting impossible.

    What I want is sites ranked by volume of relvant content, with a negative ranking element for duplicate sites and a stronger negative ranking for multiple adverts.

    Oh, and what I would also find useful is a 'go (after blocking adservers)' button instead of a 'go' button.

    --
    A pizza of radius z and thickness a has a volume of pi z z a
    1. Re:I'm ready to change by pe1chl · · Score: 1

      I noticed the "kelkoo problem" too.
      I wonder if they pay to get these results, or if this is just a confusion of Google caused by the fact that kelkoo has similar sites in many different domains that all link to eachother.
      So, Google thinks there are lots of links to a certain page and thus gives a high ranking.

    2. Re:I'm ready to change by Anonymous Coward · · Score: 0

      A good tip from another /. post somewhere, add "-search" to your search pattern and that kills a lot of the crud on google.

    3. Re:I'm ready to change by Anonymous Coward · · Score: 0

      Even simpler, a toggle to include/exclude commercial sites. Exclude by default.

    4. Re:I'm ready to change by bradshaw-ka · · Score: 1

      Amen!

  79. Check out my search engine! by Milton+Waddams · · Score: 2, Funny
    function search(){
    grep $1 < The_Internet
    }
  80. Or as Google says: by MyFourthAccount · · Score: 1

    Did you mean: Radio control speed controller

  81. Make it like a human brain... by Pedrito · · Score: 4, Funny

    I liked this quote: "Now that the Internet is very large, it makes for some well-developed memory. I would suppose that the amount of information stored on the Internet is around the level of the adult human brain. Now we just need some higher-order functionality to really take advantage of it. At one point we may even discover the protocol used in the brain and extend it with an interface to an Internet search engine."

    The protocol used in the brain? That can't be a good direction to go. I mean, if it's anything like my memory and honestly, the memory of most people I know, it's definitely going to be a step backwards. Human brains can hold a lot of information, but retreival is definitely not its specialty. I can see it now. Type in my search terms and the engine comes back with, "ummm, it's right on the tip of my tongue. Okay, I don't have a tongue, but I just about remember it. Give me just a minute to think about it. umm... umm... Nope, it's gone. Nevermind."

  82. Microsoft by Anonymous Coward · · Score: 0

    Microsoft is also coming to the party, and everyone's a little bit nervous to see what it's bringing.

    More bloody potato salad, I expect.

  83. off topic...random web pages by -O.ster_66 · · Score: 2, Informative
    not sure if you're aware of the STUMBLEUPON toolbar. its pretty cool for doing the random surf, plus you can also start to customize it to your tastes through the use of categories.

    --
    "You get all the fun of sitting still, being quiet, writing down numbers, paying attention...science has it all."
  84. To quote Trump.."youre fired!" by Anonymous Coward · · Score: 1, Funny

    I'd fire you in an instant with sloppy code like that.

    You forgot to add "Order by nipple_size desc".

  85. How to create a web search engine... by fizban · · Score: 2, Funny

    1. Buy license for existing web search engine.
    2. ???
    3. Profit!

    --

    +1 Insightful, -1 Troll. What can I say, I'm an Insightful Troll.

  86. Scalable?!? by Himring · · Score: 1

    of creating a modern, scalable search engine.

    Scalable? Please. One main reason I read /. is because it's real and cool. I cannot stand it if it starts using pop-IT lingo such as every well-polished vendor spills out to me on a daily basis.... If I see "interoperability" down further on the news page today, then I'm gunno not read /. for a week!...

    --
    "All great things are simple & expressed in a single word: freedom, justice, honor, duty, mercy, hope." --Churchill
  87. a9.com is Amazon's web search entry by squashed · · Score: 2, Informative

    Have a look at a9.com, which is Amazon's new search entry. Aside from a good web search engine, it provides a "history" of your previous searches and other innovative features.

  88. C++ for the real work by Anonymous Coward · · Score: 0

    Sorry - no stinkin' Java for massive internet web searching. I am Gary Google and I use C++.

  89. ModParentUp by monkeyfamily · · Score: 1

    funniest /. comment in months!

  90. yahoo? by Anonymous Coward · · Score: 0

    Yahoo (Altavista/AlltheWeb/Inktomi)

    Altavista is DEC, innit?

  91. isn't anyone nervous about dumbfind? by dumbfounder · · Score: 1

    well they should be! the titans will fall! muahahahahahahahahahahahahahahaha -cough-

    dumbfind.com

  92. I guess he doesn't like criticism.... by mbauser2 · · Score: 1

    Replying to myself to add: Matt Dwells just called me at home to complain about my review of Gigablast. He apparently thinks I wrote a negative review of his site because I was abused as a child. It's an interesting theory, to say the least.

    I wonder if he's planning to call everybody who criticized Gigablast in this thread?

    --
    Proud to be / Smiley-free / Since Nineteen / Ninety-Three