Google's Technology Explored
RobotWisdom writes "Internetnews offers a moderately detailed peek at Google's technology. For example, they use stripped-down Red Hat on a massively redundant network, and they're starting to have success with automatic clustering of concepts, so that pages can match even if none of the words in your query actually appear on the page." Additional analysis on InformationWeek and C|Net. From the article: "As a search query comes into the system, it hits a Web server, then is split into chunks of service. One set of index servers contains the index; one set of machines contains one full index. To actually answer a query, Google has to use one complete set of servers. Since that set is replicated as a fail-safe, it also increases throughput, because if one set is busy, a new query can be routed to the next set, which drives down search time per box."
That's now how google does it! This is their REAL secret:
http://www.google.com/technology/pigeonrank.html
It really is amazing to think of the amount of information and data that we can access so quickly these days. When I stop and think about what my little search query goes through to bring me an almost instant response, it almost seems impossible. Of course the search engine side of this is only one example, but it's a nifty insight into how powerfull our infrastructure is these days. Bravo, mankind.
The technology that is truly asstounding, is Google's ability to cache itself. Yeah, think about THAT one for a while.
It's also amazing how much of the general knowledge of the world we *can't* access, because it's unconnected or unpublished.
Just think about how vast and extensive Google's search is, and then think about how little of the World's knowledge and creative achievement it actually can access.
The quantity and breadth of human knowledge is breathtaking, no?
It's been tried. From TFA:
One literal meltdown -- a fire at a datacenter in an undisclosed location -- brought out six fire trucks but didn't crash the system.
I'm not good in groups. It's difficult to work in a group when you're omnipotent. - Q
Theoretically, he said, if someone searches for "Bay Area cooking class," the system should know that "Berkeley courses: vegetarian cooking" is a good match even though it contains none of the query words.
One word: cooking.
I'm sure the principle is sound. I just think the example is a leetle bit flawed.
What I say does not represent the views of my employers, my friends, my cats, or myself.
Here it is, from one of the Google guys:
Google: A Behind-the-Scenes Look.
Simpy
I always thougth that they used NT + Access Database.
They should make a googleCluster Live CD.. ala clusterKnoppix.. ..or perhaps use more of clusterKnoppix features or openmosix..share cpu/mem..
sourceforge is begging for something like this..
Their engineer desktops have special google builds of linux which help them compile things insanely fast with g4, ie hacked p4 (Perforce).
They also have one of the best intranet sites I've seen. Lots of info and services the employees can use, apart from email.
The internal blogs really help with keeping track of projects you're not working on, and what others are doing. Their mailing lists are often usefull too, for example there's a lost and found, for sale, and biking partners list. All kinds of usefull little stuff, taking care of the people with little nice things. Lots of reading too.
-- Robi
-- Robi
Google's redundancy theory works on a meta level, as well, according to Hoelzle. One literal meltdown -- a fire at a datacenter in an undisclosed location -- brought out six fire trucks but didn't crash the system.
"You don't have just one data center," he said, "you have multiples."
The real idea behind Google Maps is so that as the server catches fire it use it's last cycles to send an eMail to the nearest fire cheif and include a map. I think it would also throw in a GMail invite for incentive.
.\.\att Clare
Computer programming languages are great, and I love them, but that does not mean that you have to use them for everything
open browser at www.google.com
get a drinking duck thing that bobs up and down hitting F5 every second
seems better to me.
They're not obligated to share unless they are planning on redistributing the software. They are perfectly free to patch their own software and use the patched versions for their servers without sharing those modifications.
The GPL does not force them to do anything unless they wish to redistribute the software.
this is a sig.
Anyhow, the article mentioned that in these early datacentres they experienced something like a 25% hardware failure rate, but that it didn't matter because the software worked around it and the hardware was cheap.
Here's a link to the page where I read all this neat stuff. It's probably mostly about the same stuff as the article we've all just slashdotted, but I won't be albe to tell for a while....
Never eat more than you can lift -- Miss Piggy