Google's Technology Explored

← Back to Stories (view on slashdot.org)

Posted by Zonk on Thursday March 3, 2005 @05:51AM from the googling-google dept.

RobotWisdom writes "Internetnews offers a moderately detailed peek at Google's technology. For example, they use stripped-down Red Hat on a massively redundant network, and they're starting to have success with automatic clustering of concepts, so that pages can match even if none of the words in your query actually appear on the page." Additional analysis on InformationWeek and C|Net. From the article: "As a search query comes into the system, it hits a Web server, then is split into chunks of service. One set of index servers contains the index; one set of machines contains one full index. To actually answer a query, Google has to use one complete set of servers. Since that set is replicated as a fail-safe, it also increases throughput, because if one set is busy, a new query can be routed to the next set, which drives down search time per box."

8 of 294 comments (clear)

Min score:

Reason:

Sort:

Truly Amazing. by iibbmm · 2005-03-03 05:55 · Score: 5, Interesting

It really is amazing to think of the amount of information and data that we can access so quickly these days. When I stop and think about what my little search query goes through to bring me an almost instant response, it almost seems impossible. Of course the search engine side of this is only one example, but it's a nifty insight into how powerfull our infrastructure is these days. Bravo, mankind.
Also Amazing: How much we miss by Ieshan · 2005-03-03 06:01 · Score: 5, Interesting

It's also amazing how much of the general knowledge of the world we *can't* access, because it's unconnected or unpublished.

Just think about how vast and extensive Google's search is, and then think about how little of the World's knowledge and creative achievement it actually can access.

The quantity and breadth of human knowledge is breathtaking, no?
Re:/. effect by SmokeHalo · 2005-03-03 06:01 · Score: 5, Interesting

It's been tried. From TFA:

One literal meltdown -- a fire at a datacenter in an undisclosed location -- brought out six fire trucks but didn't crash the system.

--
I'm not good in groups. It's difficult to work in a group when you're omnipotent. - Q
no AND needed by tehshen · 2005-03-03 06:01 · Score: 4, Interesting

From the summary:

they're starting to have success with automatic clustering of concepts, so that pages can match even if none of the words in your query actually appear on the page.

From the help guide:

By default, Google only returns pages that include all of your search terms.

Which of these is correct? If it's the summary, is there any way to turn this behaviour off? I find it immensely annoying.

--
Guy asked me for a quarter for a cup of coffee. So I bit him.
Video about some of the backend stuff by otisg · 2005-03-03 06:04 · Score: 5, Interesting

Here it is, from one of the Google guys:
Google: A Behind-the-Scenes Look.

--
Simpy
Question... by kryogen1x · 2005-03-03 06:07 · Score: 4, Interesting

Moreover, Google has created its own patches for things that haven't been fixed in the original kernel.

Do they share these patches with everyone else?
"The text you entered was not found." by Doc+Ruby · 2005-03-03 06:10 · Score: 4, Interesting

" pages can match even if none of the words in your query actually appear on the page"

The main flaw I've found in Google's results has been when it returns pages without one of my query words, which doesn't respond to the sense of my query. Sometimes it's changed page content at the same URL, so I go back and get the "cached" page, if it exists. The cached pages reveal in their headings whether the page matched only because the query word was found only in another page linking to the returned page. I'd like their immediate results to show that distinction, and to have links in the results to click around those pages related by my complete query. The current click/back/"cache" combinations are frustratingly disconnected, conflicting with Google's otherwise smooth immediacy.

--
--
make install -not war
Re:define: cheap machines by canadiangoose · 2005-03-03 07:06 · Score: 5, Interesting

I read somewhere that early Google datacentres were built by filling their racks with plywood shelves, then filling each shelf with one power supply running four motherboards each with one HDD. They didn't even use cases. This allowed them to build massively dense datacentres very cheaply. At one point they decided it wasn't worth it to replace dead hardware, so they started placing the racks too close together to be accessible. Why dig through and replace things when you can just keep adding more?
Anyhow, the article mentioned that in these early datacentres they experienced something like a 25% hardware failure rate, but that it didn't matter because the software worked around it and the hardware was cheap.
Here's a link to the page where I read all this neat stuff. It's probably mostly about the same stuff as the article we've all just slashdotted, but I won't be albe to tell for a while....

--
Never eat more than you can lift -- Miss Piggy