Google's Technology Explored

← Back to Stories (view on slashdot.org)

Posted by Zonk on Thursday March 3, 2005 @05:51AM from the googling-google dept.

RobotWisdom writes "Internetnews offers a moderately detailed peek at Google's technology. For example, they use stripped-down Red Hat on a massively redundant network, and they're starting to have success with automatic clustering of concepts, so that pages can match even if none of the words in your query actually appear on the page." Additional analysis on InformationWeek and C|Net. From the article: "As a search query comes into the system, it hits a Web server, then is split into chunks of service. One set of index servers contains the index; one set of machines contains one full index. To actually answer a query, Google has to use one complete set of servers. Since that set is replicated as a fail-safe, it also increases throughput, because if one set is busy, a new query can be routed to the next set, which drives down search time per box."

27 of 294 comments (clear)

PigeonRank(TM) by Kimos · 2005-03-03 05:53 · Score: 5, Funny

That's now how google does it! This is their REAL secret:
http://www.google.com/technology/pigeonrank.html
1. Re:PigeonRank(TM) by Tackhead · 2005-03-03 05:58 · Score: 5, Funny
  
  > That's now how google does it! This is their REAL secret: http://www.google.com/technology/pigeonrank.html
  That was pre-IPO.
  We'd like you to meet Bubba. Bubba's fully vested, and as this article says, he's, uh... he's grown somewhat.
2. Re:PigeonRank(TM) by eric_brissette · 2005-03-03 05:58 · Score: 5, Funny
  
  Their technology for waste management alone must be revolutionary.
/. effect by Anonymous Coward · 2005-03-03 05:55 · Score: 4, Funny

If we could /. google, that would impress me
1. Re:/. effect by SmokeHalo · 2005-03-03 06:01 · Score: 5, Interesting
  
  It's been tried. From TFA:
  
  One literal meltdown -- a fire at a datacenter in an undisclosed location -- brought out six fire trucks but didn't crash the system.
  
  --
  I'm not good in groups. It's difficult to work in a group when you're omnipotent. - Q
2. Re:/. effect by Anonymous Coward · 2005-03-03 06:16 · Score: 5, Funny
  
  Computer programming languages are great, and I love them, but that does not mean that you have to use them for everything
  
  open browser at www.google.com
  get a drinking duck thing that bobs up and down hitting F5 every second
  
  seems better to me.
Truly Amazing. by iibbmm · 2005-03-03 05:55 · Score: 5, Interesting

It really is amazing to think of the amount of information and data that we can access so quickly these days. When I stop and think about what my little search query goes through to bring me an almost instant response, it almost seems impossible. Of course the search engine side of this is only one example, but it's a nifty insight into how powerfull our infrastructure is these days. Bravo, mankind.
Whats really impressive by mattmentecky · 2005-03-03 06:00 · Score: 5, Funny

The technology that is truly asstounding, is Google's ability to cache itself. Yeah, think about THAT one for a while.
1. Re:Whats really impressive by MillionthMonkey · 2005-03-03 06:33 · Score: 4, Funny
  
  I don't see what's astounding about this.
  
  Reminds me of a radio interview I once heard with the Google founders. The host was curious about what the "I'm feeling lucky!" button was about. She claimed she typed in "Google" into the search box and clicked "I'm feeling lucky!", and nothing happened, so it didn't work!
Also Amazing: How much we miss by Ieshan · 2005-03-03 06:01 · Score: 5, Interesting

It's also amazing how much of the general knowledge of the world we *can't* access, because it's unconnected or unpublished.

Just think about how vast and extensive Google's search is, and then think about how little of the World's knowledge and creative achievement it actually can access.

The quantity and breadth of human knowledge is breathtaking, no?
1. Re:Also Amazing: How much we miss by iibbmm · 2005-03-03 06:05 · Score: 5, Insightful
  
  That's why projects like wikipedia are so important, and so impressive.
  
  Only a few years ago it could take forever to find any kind of decent information on some topics online or even in libraries. Today, I go to wiki and I'm almost assured to have a FAIRLY reliable source for information, as it's cross checked by peers who have some kind of a personal interest in the subject.
  
  However, there's a downside.
  
  Back when I was in school, researching a subject typically meant going through encyclopedia after encyclopedia, which wasn't a bad thing. I learned quite a bit by being FORCED to over-research topics. Today, I can generally straight-shoot to whatever I need to find, giving my brain a good set of blinders to everything else along the way.
More useless search results? by SerialEx13 · 2005-03-03 06:01 · Score: 4, Insightful

so that pages can match even if none of the words in your query actually appear on the page.

Even pages that come up in my search results now that contain my query don't even have anything to do with what I am looking for. Isn't this just adding to the problem?

How about a Did you mean? option that doesn't compare against spelling, but related topics instead?
1. Re:More useless search results? by InfiniteWisdom · 2005-03-03 06:18 · Score: 4, Informative
  
  It says they're using clustering, so it might help eliminate pages that contain the words you're looking for but aren't relevant to your current query, in addition to including pages that are relevant but don't contain the words. For example,
  
  the word "tree" may either refer to a data structure (binary, B-,red-black etc.) or to the stuff forests are made of. If my query is "search tree", the words search and tree may show up on a page about people searching for some kind of a tree and on pages about search trees. Assuming they're both popular classes of pages, you're going to end up with some mishmash of results from both classes.
  
  Instead, the clustering algorithm might notice (based on other words that appear on the pages, for example) that pages with 'search' and 'tree' in them fall into two classes. That doesn't help if "search tree" is all it has to go by. But now if I add the words "data structure" to the query, it knows which class of pages I'm interested in, because many pages about binary trees contain the words "data structure" whereas almost none about the quest for trees do. Now it can return pages from the right cluester that it knows are relevant, even if they don't contain the word "data structure" in them.
no AND needed by tehshen · 2005-03-03 06:01 · Score: 4, Interesting

From the summary:

they're starting to have success with automatic clustering of concepts, so that pages can match even if none of the words in your query actually appear on the page.

From the help guide:

By default, Google only returns pages that include all of your search terms.

Which of these is correct? If it's the summary, is there any way to turn this behaviour off? I find it immensely annoying.

--
Guy asked me for a quarter for a cup of coffee. So I bit him.
1. Re:no AND needed by Ironsides · 2005-03-03 06:09 · Score: 4, Informative
  
  they're starting to have success with automatic clustering of concepts, so that pages can match even if none of the words in your query actually appear on the page.
  
  I think what they mean is that they are working on search algorithms that will implement this. Not that they have already made it publicly available. They want it to work first, and be released second. The problem the you have cropping up most likely occurs with pages that put info in the metadata, and hence don't show up in the page itself.
  
  --
  Fly me to the moon Let me sing among those stars Let me see what spring is like On jupiter and mars
Oops by Daedala · 2005-03-03 06:02 · Score: 5, Funny

Theoretically, he said, if someone searches for "Bay Area cooking class," the system should know that "Berkeley courses: vegetarian cooking" is a good match even though it contains none of the query words.

One word: cooking.

I'm sure the principle is sound. I just think the example is a leetle bit flawed.

--
What I say does not represent the views of my employers, my friends, my cats, or myself.
Video about some of the backend stuff by otisg · 2005-03-03 06:04 · Score: 5, Interesting

Here it is, from one of the Google guys:
Google: A Behind-the-Scenes Look.

--
Simpy
Google Lunar by Barryke · 2005-03-03 06:05 · Score: 4, Funny

They're hiring.
http://www.google.com/jobs/lunar_job.html
a snippet:
Google Copernicus Center is hiring
Google is interviewing candidates for engineering positions at our lunar hosting and research center, opening late in the spring of 2007. This unique opportunity is available only to highly-qualified individuals who are willing to relocate for an extended period of time, are in top physical condition and are capable of surviving with limited access to such modern conveniences as soy low-fat lattes, The Sopranos and a steady supply of oxygen.

--
Hivemind harvest in progress..
Question... by kryogen1x · 2005-03-03 06:07 · Score: 4, Interesting

Moreover, Google has created its own patches for things that haven't been fixed in the original kernel.

Do they share these patches with everyone else?
1. Re:Question... by limbostar · 2005-03-03 06:42 · Score: 5, Informative
  
  They're not obligated to share unless they are planning on redistributing the software. They are perfectly free to patch their own software and use the patched versions for their servers without sharing those modifications.
  
  The GPL does not force them to do anything unless they wish to redistribute the software.
  
  --
  this is a sig.
Sure? by ferar · 2005-03-03 06:08 · Score: 5, Funny

I always thougth that they used NT + Access Database.
gCluster by RobiOne · 2005-03-03 06:08 · Score: 5, Informative

They should make a googleCluster Live CD.. ala clusterKnoppix.. ..or perhaps use more of clusterKnoppix features or openmosix..share cpu/mem..
sourceforge is begging for something like this..

Their engineer desktops have special google builds of linux which help them compile things insanely fast with g4, ie hacked p4 (Perforce).

They also have one of the best intranet sites I've seen. Lots of info and services the employees can use, apart from email.

The internal blogs really help with keeping track of projects you're not working on, and what others are doing. Their mailing lists are often usefull too, for example there's a lost and found, for sale, and biking partners list. All kinds of usefull little stuff, taking care of the people with little nice things. Lots of reading too.

-- Robi

--
-- Robi
kernel patches? by alphan · 2005-03-03 06:08 · Score: 4, Insightful

Moreover, Google has created its own patches for things that haven't been fixed in the original kernel.
and the obvious question:
where are the patches?
Anybody knows? This is not a GPL question just an ethical one.
"The text you entered was not found." by Doc+Ruby · 2005-03-03 06:10 · Score: 4, Interesting

" pages can match even if none of the words in your query actually appear on the page"

The main flaw I've found in Google's results has been when it returns pages without one of my query words, which doesn't respond to the sense of my query. Sometimes it's changed page content at the same URL, so I go back and get the "cached" page, if it exists. The cached pages reveal in their headings whether the page matched only because the query word was found only in another page linking to the returned page. I'd like their immediate results to show that distinction, and to have links in the results to click around those pages related by my complete query. The current click/back/"cache" combinations are frustratingly disconnected, conflicting with Google's otherwise smooth immediacy.

--
--
make install -not war
Google Maps - Designed to protect data centres by Matt+Clare · 2005-03-03 06:14 · Score: 5, Funny

Google's redundancy theory works on a meta level, as well, according to Hoelzle. One literal meltdown -- a fire at a datacenter in an undisclosed location -- brought out six fire trucks but didn't crash the system.

"You don't have just one data center," he said, "you have multiples."

The real idea behind Google Maps is so that as the server catches fire it use it's last cycles to send an eMail to the nearest fire cheif and include a map. I think it would also throw in a GMail invite for incentive.

--
.\.\att Clare
Re:define: cheap machines by canadiangoose · 2005-03-03 07:06 · Score: 5, Interesting

I read somewhere that early Google datacentres were built by filling their racks with plywood shelves, then filling each shelf with one power supply running four motherboards each with one HDD. They didn't even use cases. This allowed them to build massively dense datacentres very cheaply. At one point they decided it wasn't worth it to replace dead hardware, so they started placing the racks too close together to be accessible. Why dig through and replace things when you can just keep adding more?
Anyhow, the article mentioned that in these early datacentres they experienced something like a 25% hardware failure rate, but that it didn't matter because the software worked around it and the hardware was cheap.
Here's a link to the page where I read all this neat stuff. It's probably mostly about the same stuff as the article we've all just slashdotted, but I won't be albe to tell for a while....

--
Never eat more than you can lift -- Miss Piggy
Re:Laziness, ignorance or by Kashif+Shaikh · 2005-03-03 08:19 · Score: 4, Insightful

None of the concepts of computer science are new, but what is ground breaking is Google touching all aspects of computer science to solve a problem. Distributed Databases, Replicated Filesystems, Clustering, Learning algorithms, job scheduling, map/reduce languages, etc. are not new. But they applied each of these sub-domains to 'searching' and 'lots of data'. Using old ideas is _new_ ways is ground breaking. That what everyone does(like Carmack and DOOM3).