Slashdot Mirror


Interview With Google's Director of Research

Cialti writes "Salon has a very interesting article with Monika Henziger, Google's Director of Research, about their search technology and where they're going with it. "

63 of 135 comments (clear)

  1. Re:Voice activated search engine by Anonymous Coward · · Score: 2


    (car cuts driver off)
    "Fuck you, asshole!"
    (computer beeps)
    [25,945 results found.]

  2. Actual Questions for Ask Jeeves by Tony+Shepps · · Score: 2
    In the good old days, ask.com let you see everything being asked of Jeeves, unfiltered. I watched it for a while, saving off the really weird questions, and made a page of it here.

    Happy reading, and remember, you're looking at the end of the human race.

  3. Re:Prepositions need love too by Malc · · Score: 2

    Judging by the article, they build lists of words, and find their intersection. I can't imagine how big the lists for common words (e.g. articles) would be. Perhaps they had to cut them out due to hardware constraints?

  4. Deja by Tet · · Score: 2

    The most interesting part about the interview was the snippet that implies Google didn't have much of a say in the Deja archives being down after the buyout. So it wasn't the complete cock up that we all thought it was. They still handled the PR really badly, though. If they'd just told people what was happening, I'm sure they wouldn't have come across half as badly as they did.

    --
    "The invisible and the non-existent look very much alike." -- Delos B. McKown
  5. Re:[ot]Google's data structure? by K-Man · · Score: 2

    The documents are assigned id's 1..n and, for each word, an ordered list of id's of documents containing the word is constructed. When a search asks for, say, "cheese fondue" the array for "cheese" and the array for "fondue" are retrieved and merged using a sorted list merge (fast, since the arrays are already ordered). The result is a list of document id's that were in both lists, i.e. documents containing both words.

    There are various ways to speed this up by compressing the arrays, hash joins, etc., but the basic idea is the same.

    --
    ---- "If we have to go on with these damned quantum jumps, then I'm sorry that I ever got involved" - Erwin Schrodinger
  6. Re:[ot]Google's data structure? by K-Man · · Score: 3

    That's true if the data is changing. However most search engines do web crawls in large chunks, and index the data once in one large block. Under such conditions dynamic management of hit lists and other data structures is not necessary. Basically, the bytes are packed as tight as they can get them so that it all fits into memory.

    As far as I can tell from their paper, Google manages its web crawls the same way. It partitions the data into "barrels" and indexes each separately. Once the indices are built, they aren't updated. They also extend the hit lists to include word position and some other attributes for each hit.

    --
    ---- "If we have to go on with these damned quantum jumps, then I'm sorry that I ever got involved" - Erwin Schrodinger
  7. Re:Prepositions need love too by Zagadka · · Score: 2

    Now bear in mind that Google couldn't even come up with the phrase, however much I +'d it to death, on its top ten list. If I only have that one phrase in memory on Google, I can't find it.

    The problem is that you +'ed it too much. If you search for +"+but +that +the +dread" you'll notice that it gives you some warnings. Google's ignoring all of the +'s you added, because you're using some of them incorrectly. ("dread" is not a stop word, for example)

    Instead, try searching for "but +that +the dread". Then you'll get what you're looking for.

  8. Re:Voice activated search engine by FFFish · · Score: 2

    Oh, gahd.

    That's just great. Now the cell-phone dolts in the SUVs will be using Google *at the same time* to check on their facts, *while* they are driving...



    --

    --

    --
    Don't like it? Respond with words, not karma.
  9. Masturbation Techniques by ergo98 · · Score: 5

    Google absolutely blows away the competition, however it is humorous seeing entries in my log file related to people looking for masturbation tips (from the beginner level "How To" style queries, to full blown searches for advanced techniques). The page in question is entitled "Hey Jerk : Get Off My Computer!" (and relates to pop-up ad windows) and I'm, uh, proud to see that it ranks #2 for searches for "jerk off technique" (I've had dozens of related hits appearing). While it is humorous seeing searching going a little off-track, I am very curious how many consumers know that each link you follow passes on where you came from, so for instance I see log entries like

    200x-xx-xx xx:xx:xx xxx.xxx.xxx.xxx GET /rants/jerk/index.htm 200 5986 334 270 Mozilla/4.0+(compatible;+MSIE+5.0;+Windows+98;+Dig Ext) http://google.yahoo.com/bin/query?p=jerk+off&b=21& hc=0&hs=5
    -or-
    200x-xx-xx xx:xx:xx xxx.xxx.xxx.xxx GET /rants/jerk/index.htm 200 5986 437 1292 Mozilla/4.0+(compatible;+MSIE+5.0;+Windows+98;+Dig Ext;+sureseeker.com) http://www.google.com/search?q=guys+who+jerk+off

    1. Re:Masturbation Techniques by daviddennis · · Score: 2

      The OmniWeb browser on MacOS X has a very nice feature, enabled by default, which simply disables all pop-up windows. You can disable all pop-ups, or disable only pop-ups that are not the result of you manually clicking on a link.

      Unfortunately, OmniWeb's JavaScript support is lacking in other areas, but that feature is brilliant, and their text display is the cleanest I've ever seen in any program. Linux users should get MacOS X just to rest their bad font weary eyes :-).

      D

      ----

    2. Re:Masturbation Techniques by Krilomir · · Score: 2

      This is one of the reasons I found Gnutella fun when it first came out ... just looking at all those searches. It became even more fun when people began using the Gnutella-search-stream as a chat-feature ;)

  10. Re:why I like google by daviddennis · · Score: 2

    Perhaps the best news, though, is that

    http://www.google.com/windows/

    doesn't work. Great job!

    D
    ----

  11. Voice activated search engine by funkman · · Score: 2
    They are working with BMW to see they can integrate the search engine into the car to do a search base on what you say.

    Even out of the scope of a car - this feature would be awesome if it were integrated with cable (or satellite) and the TV room

    Get me Gilligan's Island ... Click

    1. Re:Voice activated search engine by funkman · · Score: 3
      I love when people don't read the article and post. From page 2 of the article:
      What other kinds of search are you developing?

      We have a voice-search project with BMW -- BMW wants to put voice search into their 7 Series cars. They want to put microphones in the cars -- you can just speak whatever your search is and then it gives you answers back on a display. Then you just say the result number and the search jumps to that result.

  12. Re:Perks by ethereal · · Score: 2

    All search engines spider ahead of time and store; to do otherwise would take forever to get you any search results ("It's a terrible strain on the animators' wrists." :) My impression from the article was not that they generate whole searches ahead of time, but that they categorize by the individual search words, and then when you type in a query they generate the intersection of the pages on their many word lists. Then one miracle occurs, and ...

    Caution: contents may be quarrelsome and meticulous!

    --

    Your right to not believe: Americans United for Separation of Church and

  13. Re:Regex: won't happen by griffjon · · Score: 2

    *sigh* you're right.

    But in the case where they would implement my ability to submit a RegEx, I could give them lots of flex on the time in return for the exact one page that I want. How hard could it possibly be?
    (dodging)

    --
    Returned Peace Corps IT Volunteer
  14. Re:Prepositions need love too by griffjon · · Score: 3

    I'm just waiting for them to implement a RegEx interface. now THAT would be some love for the geeks out here.

    --
    Returned Peace Corps IT Volunteer
  15. Dumb question (?) by DonK · · Score: 2

    I missed an answer to "How come for the last N months the Google front page has stated:

    Search 1,346,966,000 web pages

    and this number doesn't change?"

  16. Re:Prepositions need love too by King+Babar · · Score: 3
    For example, searching for: "Hail to the chief" would ignore to and the. In order to actually search for the phrase (which I indicated that I wanted to do by surrounding it in quotation marks), I would have to type "Hail +to +the chief". Hardly user-friendly.

    And, actually, that's not quite right, either. It's apparently always going to blow off your "the" (I just tried it). This is, alas, a seriously hard problem. What you were doing was looking for what actually amounts to a single chunk of information: the title of a fanfare played for the president. Unfortunately, the English version of the title is four words long although the title itself might in some cases act just like a single word (or noun phrase). So:

    That was one of the worst "Hail to the chief" s that I have every heard.
    Yes, you might even pluralize it just like a noun. So that's one problem right there: search terms that really are tantamount to a single lexical item might be four or more words long, and might even be inflected.

    Ideally, you'd like to index separately these multi-word chunks, especially if you can prove they occur way more often than expected. So in your example, "hail" and "chief" co-occur on about 28,000 pages, while "hail" alone is on 510,000 and "chief" alone is on over 1,500,000. If Google indexes 1.5 billion pages (or so), and the terms were independent, then, you'd expect something like 5000 co-occurrences, and 28,000 is so outrageously out of line you would know that something is up.

    Now, I'm guessing that *local* co-occurrence information is likely to eventually going to prove even handier in this regard. So, for example, "hail to" comes up 157,000 times, which is about 1/3 of all "hail" pages. That's very unlikely unless there's something systematic (and very possibly exploitable) going on.

    The big problem is that you can't really do much with function words alone, since they're just too staggeringly frequent. In running English text, the frequency of "the" is just about 70,000 per million. In other words, 7% of all English text consists of the definite article, and most web pages contain many distinct copies. You've got to kill that. Unfortunately, by omitting "the", you lose a lot of potentially useful information about definiteness of the noun phrase. In the "hail to the chief" example, the song title itself is just one example of a (somewhat) productive expression "hail to [definite-NP]", which has a specific kind of meaning implied (interestingly, usually sarcastic or abusive). Picking up on this could be very useful.

    So suppose I typed into deja "bush mass-mooning Gothenburg". I'll get 9 hits. That's nice, but google might want to do more, and provide additional examples of president (or candidate) Bush being derided in public. Or maybe give me pages that refer to the same incident being described as the Swedish version of "hail to the chief".

    So there is no doubt that function words need love, but I'd argue for a love that seeks to understand them and their weird little contributions to meaning rather than just a way to make sure you can nail a song title exactly.

    --

    Babar

  17. the technology behind google by dizco · · Score: 2

    There's an excellent presentation at technetcast by jim reese (cheif operations engineer @ google) called "the technology behind google", in mp3 format. Its much more technical than this interview, really a very good listen. get it here

    --sean

  18. Re:[ot]Google's data structure? by daytrip · · Score: 3

    You'll probably get a resonable idea at this page:

    http://www-db.stanford.edu/~backrub/google.html.

    Also, try a lookup for a bloom filter, which google uses, I think. Most search engines work by inverting the index, and then merging the lists. Taking the intersection of all the keywords gives ou the membership, then you apply ranking to the membership. Pretty simple concept. I don't know of any search engines that use a trie, or use any form of stemming.

    -js

  19. Re:Smarter Searches by Xofer+D · · Score: 2

    Last semester, I did a directed study about applying approximate machine reasoning to human information access, specifically to searching hypertexts of metadata. One of the ideas I looked at was an article about a search engine called FuzzyBase (pdf) which was developed by three people including my professor, who works in the SFU Communication Networks Laboratory. FuzzyBase did just what you suggest - it used an interactive user session to disambiguate user queries. There are several interesting technologies which use this sort of thing to obtain unambiguous search keys, and most involve the usage of semantic ontologies. If you want to get started looking at this stuff, have a look at some of the articles on this page, especially the online links at the end of the page. There are already search engines that do this to some degree.

    --
    The Signal/Noise ratio can be improved in two ways. Remaining silent is the OTHER way.
  20. German queries at fireball.de by harmonica · · Score: 2

    German search engine fireball.de has a page that lets you see what others have requested in the last 30 seconds. There are some sick people out there...

  21. MP3 of that talk by harmonica · · Score: 3

    You probably mean The Technology Behind Google. It's a 73 min MP3, very interesting!

  22. Re:Yeah Suckah! by htmlboy · · Score: 5

    Google gave a talk for ACM here last semester (got a t-shirt, woohoo!). The speaker described how they're used. They have thousands of linux boxes, and they're used to store websites (to be searched and cached copies) and to do searching on the pages they have (I think that's how it went). I got the impression that linux is used because it's free (important with thousands of licenses), it's reliable, and they found it a good platform for the searching backend software.

    an interesting side note: they found that when one of the linux boxes stops working, it's more cost effective to replace it than to fix the problem (hardware, at least). google throws out a lot of good hardware because of that. the lecture hall was begging for a student donation program of some sort when the google guy mentioned that :)

    chris

  23. Send messages to the staff! by dead_penguin · · Score: 5

    With the giant display of scrolling queries (filtered, though) they have in their lobby, I think it's time to start sending little messages to the Google staff using searches.

    "Help, I'm stuck in here!!" is an obvious classic to try. If enough of us do it, it might even get noticed...

    "Intelligence is the ability to avoid doing work, yet getting the work done".

    --

    It's only software!
    1. Re:Send messages to the staff! by Tofuhead · · Score: 2

      Before I read your post, I had the same idea. I just sent one that said "Sorry, am I DOSing the Google lobby scroller?" Then, after reading this post, I did a search for "jerk off technique."

      Hope those scroller babies don't log IPs. It would look like I was so bored (at work right now) that I decided to SPAM their scroller, which had somehow gotten me into some kind of masturbatory mood.

      < tofuhead >
      --

      --
      It is still the dark of night.
  24. Re:Smarter Searches by gorilla · · Score: 2

    Google already has this. If you do a search on 'slishdot' it asks you if you meant slashdot.

  25. Re:[ot]Google's data structure? by costas · · Score: 2

    A speculative answer since b-trees are my bread and butter (I am just now specing a 2TB data-mine): hundreds of thousands of entries (or hundreds of millions) should not really bother a b-tree. From the articles about Google, I am guessing they have implemented some sort of distributed b-tree app server, across all those COTS linux boxes.

    I am curious as to what kind of implementation they are using; Google's roots would suggest some hacked form of Berkeley DB with lots of performance improvements.

    Oh, well, just some guesswork... if I am close, I am expecting a job offer by the way :-)...

  26. Re:Smarter Searches by Louis+Savain · · Score: 2

    Interesting work. Thanks for the helpful links.

  27. Re:Smarter Searches by Louis+Savain · · Score: 2

    Google already has this. If you do a search on 'slishdot' it asks you if you meant slashdot.

    Thanks for this suggestion. Although it is a good example of interaction between the engine and the user, it seems to be based on a simple spelling check. Rather, I was thinking more in terms of what Monika Henziger referred to as a topic based query. For example, typing 'bicycle' and receiving a choice of 'bicycle repair', 'bicycle racing', 'bicycle sales', 'bicycle parts', 'bicycle touring', etc...

  28. Re:Smarter Searches by Louis+Savain · · Score: 2

    Thanks for the info on Excite's zoom feature. I am impressed. I wonder how they go about creating their topic associations. Do they compile it manually or do they have a automated tool that searches previous user inputs to come up with the most common keyword associations? An automated tool would, of couurse, be much more efficient and cheaper to operate.

  29. Smarter Searches by Louis+Savain · · Score: 4

    Monika Henziger: You can try to return documents that are specifically on this topic. We're developing more sophisticated techniques to return documents that might not mention the query words, but are [still relevant to] the topic. We're getting away from just pure word matches and getting more into topics.

    This is interesting. I wonder if there might be a way for the engine to have a two way back-and-forth "conversation" with the user. IOW, if the engine interprets the query to have several possible meanings, a few multiple choice questions might clarify the meaning and narrow the search parameters. I think this could be more helpful than doing a blind guess of the user's intention.

    1. Re:Smarter Searches by Fencepost · · Score: 2
      I wonder if there might be a way for the engine to have a two way back-and-forth "conversation" with the user. IOW, if the engine interprets the query to have several possible meanings, a few multiple choice questions might clarify the meaning and narrow the search parameters.

      I believe it was Altavista that had (and may still have, though I don't see any sign of it) something along these lines - after a query, it would also present an option to narrow the query by selecting some other key words that appeared in some of the pages. If I recall correctly this was not on the main query results pages, but there was a link to it.

      For the example someone posted earlier where he gets a lot of hits from people looking for masturbation tips, using that option would present you with several groupings of words - one group might include "masturbate" and other terms likely to be found on that sort of pages, another group might include "network," "security," and "adware." Each group and each word within a group had a checkbox that could be used to select additional words to use in limiting the search.

      I suspect that this was dropped for load reasons, though I could be wrong - it may be that people just didn't use it and they decided it wasn't worth the hassle.

      -- fencepost

      --
      fencepost
      just a little off
  30. phone book function by spasm · · Score: 2

    Dunno if anyone's noticed the new 'phone book' function - type "your name" {your city/state/zip code} if you live in north america and see what comes back as the first google find. Your home address & phone number, at least if you're in the phone book.

    I first noticed this function when searching for information on the professional work of someone who I was going to be working with - and the #1 thing google spat up was his home address and phone number. I know I could have found this almost immediately if I went actively looking for it, but it was a bit creepy anyway. I guess the reason I'm disturbed it that it wouldn't have occured to me to go looking for that information, but once it was thrust in my face like that, I could immediately think of reasons it might be handy to have it.. In the event, I didn't copy it down anywhere, but, well, I could think of people who wouldn't hesistate to call me at 3am if they had my home number..

    Fortunately google seems willing to at least let you opt out - http://www.google.com/help/pbremoval.html - which is fine for people who know about google and its more esoteric functions, but ain't going to help Jane Shmoe when she starts wondering why so many more people seem to know here she lives and what her home number is - people who wouldn't necessarily have gone looking for the information (that would be rude..) but who don't mind having it when it's 'handed' to them.

    1. Re:phone book function by freeweed · · Score: 2
      Doesn't seem to work for me up here in Canada, although my name does come up with some interesting stuff that I've never seen online before :)

      As for not having your phone number/address on the internet... that's why the phone companies are required by law to allow you to de-list. Without the internet, it takes me all of 5 minutes to drive to my local library, where they have phone books from around the world for the taking. Oh yes, and the white pages here only list first initial anyway :)

      --
      Endless arguments over trivial contradictions in books written by ignorant savages to explain thunder in the dark.
  31. Re:Prepositions need love too by LocalYokel · · Score: 3
    Search terms have all kinds of problems.

    I had the same problem yesterday when I was searching for "quotes about Shakespeare". "to be or not to be" (with quotes) pulls up the proper category, but the first rsult it comes up with is the GNU homepage, because GNU's not Unix!. The second link is to Am I Hot or Not, BTW...

    Strangely enough, it warns about "or", and if I want to use it in a search, it must be in CAPS, but then how do I search for something in ORegon? For some reason, it says nothing about "not", so I don't know what's up with their search terms anymore.

    --

    --

    --
    E2 IN2 IE?

  32. Prepositions need love too by zpengo · · Score: 4
    A recent development in Google technology left me very dismayed -- They started ignoring "common words."

    This makes sense on a general level, but when you try searching for a phrase embedded in quotation marks, it's frustrating to have Google decide which parts of a literal string to search for and which to ignore. If I had wanted it to ignore parts of it, I wouldn't have indicated that it was a literal phrase, dangnabbit!

    It is possible to include words that you typed in the search phrase, but you have to add an Altavista-style '+' before it.

    For example, searching for: "Hail to the chief" would ignore to and the. In order to actually search for the phrase (which I indicated that I wanted to do by surrounding it in quotation marks), I would have to type "Hail +to +the chief". Hardly user-friendly.

    Oh, well.

    --


    Got Rhinos?
    1. Re:Prepositions need love too by BitchAss · · Score: 2

      I tried your search for "to be or not to be" using the +'s in front and I got this back:

      Google always searches for pages containing all the words in your query, so you do not need to use + in front of words. [details] The word "or" was ignored in your query -- for search results including one term or another, use capitalized "OR" between words.[details] The following words are very common and were not included in your search: to be to be. [details]

      That seems so pointy-haired-bossish.

      --
      Like sex? Read and write about it! Indecent Blogging
  33. Northernlight by JPMH · · Score: 2

    Northernlight categorises its returns into "Custom Search Folders", subject by subject.

  34. Here is the real google info... by jwater · · Score: 5
    Here at Slashdot it seems like people only can complain about a service. Most of the posts are rants without understanding of the dynamics below them.

    I think we all could use more understanding of the topic. A link to the paper that started it all here.

    1. When was the last time that "to" or any other preposition helped the average query. Your Grandmother does not know that this word is meaningles 99.9% of the time, so google ties to improve their relevancy.

    2. Google has not sold out. Their ads are the most simple in the industry. They give access to users like you and me at reasonable rate. Who wants to wait for 345x123 pixel banner ads anyways.

    3. Have you noticed the spelling feature? Google will correct your spelling. This is a function of the tons of bigrams that they have stored.

    4. Here is a link to more papers [Warning: Technical] here.

  35. Re:More on language translation... by FTL · · Score: 2
    > (translated from English to Korean
    > and then back to English again)

    And that's the catch. Most documents are readible after they;ve been put though the blender once. But two passes through the blender results in garbage.

    The Fish is quite good for the one-way trips that it was designed for. A round trip ticket through the Fish is usually deadly.
    --

    --
    Slashdot monitor for your Mozilla sidebar or Active Desktop.
  36. read the article by mr_gerbik · · Score: 2

    They filter what gets projected.. maybe you should have read the next sentence before posting.

    "That's a filtered version, except that the filter doesn't work well in other languages. So we had people here from BMW, and they told me that there were some German queries that got through that shouldn't have.

    [Note to self: Curse on Google only in foreign tongues.]"

    1. Re:read the article by mr_gerbik · · Score: 2

      Have you ever had experience with filtering software? Any filtering software worth 2 cents looks for that kind of shit.. purposeful misspellings, replacements like 0s for Os, 1s for ls. I think Google is smart enough to make a filter like this. So no.. "britney spears suk1ng c0ck" isn't going to get through. Beyotch.

    2. Re:read the article by Rogerborg · · Score: 2
      • They filter what gets projected.. maybe you should have read the next sentence before posting.

      Uh huh, and maybe you should have read the trailing ;) before replying.

      ;)

      --
      If you were blocking sigs, you wouldn't have to read this.
  37. Re:why *I* like google by mr_gerbik · · Score: 2

    AND...

    Mac only searches.. and a cool Mac logo!
    http://www.google.com/mac

    AND...

    US Government searches... and a "cool" US logo?
    http://www.google.com/unclesam

  38. why I like google by mr_gerbik · · Score: 3

    who else has linux only searches?.. and not only that, a cool linux google logo!

    http://www.google.com/linux

    -gerbik

  39. SatireWire: interview with Jeeves by mrBlond · · Score: 2

    http://www.satirewire.com/features/satire-jeevesin terview.shtml
    --
    mrBlond (I don't email from Malaysia)

    --
    CowboyNeal for president!
    "Hit any user to continue."
  40. Re:Disturbing Search Requests by don_carnage · · Score: 2

    You probably should check out this site: Disturbing Search Requests

    --

  41. Re:Disturbing Search Requests by don_carnage · · Score: 2
    I can't remember where I found that -- it may have even been here on /.

    It kinda makes you want to start checking those referer logs, eh? I found once that was looking for 'priceless pissing'. No clue how they ended up on my site!

    $ grep google /usr/apache/logs/referer_log

    --

  42. Yahoo took a much bigger leap - it licensed Google by arete · · Score: 2

    What Yahoo did was license google, instead of what they were doing before, licensing Inktomi. Google rocks.

    http://news.cnet.com/news/0-1005-200-5561996.htm l

    --
    Looking for freelance Actionscript (Flash/Flex) or ColdFusion work and/or freelance developers. Email me, put Slashdot
  43. What do you expect, a monolith? by arete · · Score: 2

    Yahoo is repackaging existing services - they're repackaging Google. And yahoo has more name recognition, so more people use it. And they bring in more revenue in ads, so more money goes to Google to develop.

    Google OTOH, is developing new technology. Most of that development is incremental -things get better and better. Until we actually find an alien monolith to give us all our science, this is how most advancements happen.

    --
    Looking for freelance Actionscript (Flash/Flex) or ColdFusion work and/or freelance developers. Email me, put Slashdot
  44. Re: weird google pages (was "why *I* like google") by wishus · · Score: 2

    www.google.com/redhat - Doesn't do anything special, but the URL is there
    www.google.com/palm - Looks to be made for monochrome PDA browsers
    www.google.com/ie - For Pocket IE maybe?
    wishus
    ---

  45. method for increasing hits by jvj24601 · · Score: 2

    A friend of mine (web developer) says that he's created a way to increase the hit count among all the sites he creates. He uses a server-side Perl scripts to determine if the Google bot is hitting a page, and includes links to *all* of the sites' homepages that they are hosting. So if he includes this script on every page of every site he hosts, then every page links to every site.

    Does this work? I mean, they include (in plain English) something like "Here are some of the other sites we, [our web design firm], created and host" along with a short blurb. It sounds like it would work, right?

  46. new search engine by Aalschover · · Score: 2

    This company claims they are writing the new serach engine for Google. Click on clients and then #6.

    It really says 'To fullfill their needs, we built a brand new searcg engine for Google.....'

    [flash alert]

  47. Re:isn't Google always getting itself in the news? by markov_chain · · Score: 2
    There is nothing technological that Google is doing that isn't done by other engines (Excite, Hotbot).

    Really. Google uses a patented ranking algorithm, described by Page and Brin (Stanford graduate students which founded Google) in a paper titled The PageRank Citation Ranking: Bringing Order to the Web (1998) . The algorithm does very well at recognizing relevant documents. Last I looked, other search engines used mostly sets of hand-tuned hacks which did not do as well. Has this changed? I'd appreciate some references, refereed if possible.

    ~

    --
    Tsunami -- You can't bring a good wave down!
  48. More on language translation... by sdo1 · · Score: 3

    These translation services (such as BabelFish on AltaVista) still have quite a way to go before they're completely reliable. Especially when you translate from one language to another, you might end up with something similar to this (translated from English to Korean and then back to English again):

    Will be complete and on the front of the L it will be reliable to translation service (as the BabelFish is same) a yet positively is thin method to Altavista. It was special and when you from one language also translate in different one thing, you in child one silence comfort ended to this, (and the that time English back mac tayn Great Britain from again under translate again in a Korean):

    -S

    --
    --- What parts of "shall make no law", "shall not be infringed", and "shall not be violated" don't you understand?
  49. Regex: won't happen by brlewis · · Score: 2

    You can pre-build lists of matches by word, but regex is too general a concept. You can't pre-build an index that will help speed up a query based on some yet-to-be-specified regex. There's just no way to do it fast.

  50. [ot]Google's data structure? by wrinkledshirt · · Score: 3

    Okay, this is so off-topic it's not even funny.

    Anybody have an inkling of a clue of the data structure that Google uses (or probably uses) to store all its words? I was just thinking that maybe it was some sort of balanced binary tree with each node containing a word, two pointers to the next two words further down the tree, and the root of a linked list of all the pages that word is contained in? I know binary search trees are supposed to be fast, but I was wondering if that'd be good enough for something with probably hundreds of thousands of words?

    I'm assuming they're not using some sort of sql LIKE "%searchword%", I can't imagine any kind of cluster that could speed that process up, although I don't really know all that much about the process or what the main benefits of clustering are.

    Anyway, hugely sorry for the offtopic post, it's just something that's been on the brain lately...

    --

    --------
    Bleah! Heh heh heh... BLEAH BLEAH!!! Ha ha ha ha...

    1. Re:[ot]Google's data structure? by blamanj · · Score: 5

      Probably they use a trie or the related Patricia tree. These are very space efficient and relatively fast.

  51. Search 1,346,966,000 web pages by thgood · · Score: 2
    Do you know how long it has been since they changed that number on their homepage????

    I emailed Google about it gave me some crap about it being too difficult....

    What the mess...

  52. Much like McDonalds by freeweed · · Score: 2
    They've been claiming '99 billion served' for several years now. Either they have a Y2Kish problem with their signs, or they're about to unleash the biggest wave of advertising the world has ever seen.

    One Hundred Billion Served!. Could become as common as that evil Castaway DVD commercial that's repeated at least 50 times a night on TV.

    --
    Endless arguments over trivial contradictions in books written by ignorant savages to explain thunder in the dark.
  53. Yeah Suckah! by Louis_Cyphier · · Score: 2

    We're all aware of the fact that google r0x0rs, but one thing I've always been curious about Google and their "linux boxen" is, do they only use Linux for their servers, or do they have other practical uses, IE Quake Servers, and just workstations, or is Linux used only for price reasons? Anyone know?

    --
    ,/""-. / `-. ( ,--._ `-. "\_ `-. `,