Google's Technology Explored

← Back to Stories (view on slashdot.org)

Posted by Zonk on Thursday March 3, 2005 @05:51AM from the googling-google dept.

RobotWisdom writes "Internetnews offers a moderately detailed peek at Google's technology. For example, they use stripped-down Red Hat on a massively redundant network, and they're starting to have success with automatic clustering of concepts, so that pages can match even if none of the words in your query actually appear on the page." Additional analysis on InformationWeek and C|Net. From the article: "As a search query comes into the system, it hits a Web server, then is split into chunks of service. One set of index servers contains the index; one set of machines contains one full index. To actually answer a query, Google has to use one complete set of servers. Since that set is replicated as a fail-safe, it also increases throughput, because if one set is busy, a new query can be routed to the next set, which drives down search time per box."

8 of 294 comments (clear)

Min score:

Reason:

Sort:

More useless search results? by SerialEx13 · 2005-03-03 06:01 · Score: 4, Insightful

so that pages can match even if none of the words in your query actually appear on the page.

Even pages that come up in my search results now that contain my query don't even have anything to do with what I am looking for. Isn't this just adding to the problem?

How about a Did you mean? option that doesn't compare against spelling, but related topics instead?
Re:Also Amazing: How much we miss by iibbmm · 2005-03-03 06:05 · Score: 5, Insightful

That's why projects like wikipedia are so important, and so impressive.

Only a few years ago it could take forever to find any kind of decent information on some topics online or even in libraries. Today, I go to wiki and I'm almost assured to have a FAIRLY reliable source for information, as it's cross checked by peers who have some kind of a personal interest in the subject.

However, there's a downside.

Back when I was in school, researching a subject typically meant going through encyclopedia after encyclopedia, which wasn't a bad thing. I learned quite a bit by being FORCED to over-research topics. Today, I can generally straight-shoot to whatever I need to find, giving my brain a good set of blinders to everything else along the way.
kernel patches? by alphan · 2005-03-03 06:08 · Score: 4, Insightful

Moreover, Google has created its own patches for things that haven't been fixed in the original kernel.
and the obvious question:
where are the patches?
Anybody knows? This is not a GPL question just an ethical one.
1. Re:kernel patches? by The+Bungi · 2005-03-03 06:41 · Score: 3, Insightful
  
  where are the patches?
  They'll tell you as soon as you point out where or how they are distributing them (yes, that's why it wasn't a GPL question).
  Why should Google be "ethical"? Likely these modifications are part of their IP trove, which keeps them ahead of the (already heated up) competition.
  If you don't like the way someone uses the software you're giving away then perhaps you shouldn't give it away, or maybe it's just that the license is flawed. It's dumb to expect people who run billion-dollar publicly traded corporations to be "ethical". Mom and pop shops are "ethical".
  The whole concept of "free software" as encoded by the GPL is increasingly being outmoded by things like server-bound distributed applications (see that clumsy Affero GPL) and companies like Google which have strategic interests in the stuff. It's called progress.
Frugal Google by Sundroid · 2005-03-03 06:22 · Score: 3, Insightful

The word, "cheap", is used 4 times in the C/Net article that describes Google's "secret of success" -- "buying relatively cheap machines", "cheap commodity PCs", "(Power) becomes a factor in running cheaper operations", "not just buying cheaper components".

They say being frugal is a virtue, which Google has, evidently. What is the lesson here? Holding down the cost and being innovative never fail. I guess.

--
Sun and Fun
Re:no AND needed by M00TP01NT · 2005-03-03 06:39 · Score: 3, Insightful

I don't know if this is what TFA was getting at, but in a google cache page you may from time to time see the phrase "These terms only appear in links pointing to this page: ...".

For example, try searching for "miserable failure" on Google. The first result is George Bush's biography on www.whitehouse.gov.

However, the term "miserable failure" doesn't actually show up (yet) in the biography. But, pages that POINT to the biography do include those terms.

As a result, pages can match your search query even if none of the words in your query actually appear on the page.
Re:Laziness, ignorance or by Kashif+Shaikh · 2005-03-03 08:19 · Score: 4, Insightful

None of the concepts of computer science are new, but what is ground breaking is Google touching all aspects of computer science to solve a problem. Distributed Databases, Replicated Filesystems, Clustering, Learning algorithms, job scheduling, map/reduce languages, etc. are not new. But they applied each of these sub-domains to 'searching' and 'lots of data'. Using old ideas is _new_ ways is ground breaking. That what everyone does(like Carmack and DOOM3).
Re:Laziness, ignorance or by akirchhoff · 2005-03-03 08:22 · Score: 3, Insightful

In my experience, you can add, "don't want to pay for". Some of the places I have worked for aren't lazy, ignorant of the possibilities; they have made a deliberate decision to work cheap. They will accept the downtime from a quick and dirty design, rather than pay for better design. It's all in the numbers, how much will we lose if we are down.