What's It Like to be Google's Boss Techie?
We'd like to welcome Google Director of
Technology Craig Silverstein as our next Slashdot
interview victim... err... guest. You think you
run a big Linux server farm? Craig's is bigger.
Think your Web site gets a lot of traffic and
creates a lot of headaches? Just think what Craig
must face! Post whatever you'd like to ask Craig
below, one question per post. About 24 hours after
this runs we'll email Craig 10 of the
highest-moderated questions, and we'll post his
answers shortly after he gets them back to us.
Google always seem to be early-to-market with some really highly developed software solutions, and also always seems to have the backbone to support them.
I'm curious -- what drives the innovation? Is it the hardware team advancing architecture to permit the software team more room to play, or is it the software team saying, "Hey, look what we got!" and the hardware team dropping the iron to implement it?
I understand there must be some level of synergy, but is it completely seamless or is one side of the equation effectively driving the other?
Leem
What type of machines/setup does Google use?
(I've heard thousands of PC's with everything in RAM, but I'd love to hear it from the horses mouth)
A relatively simple, non-intellectual question, but I've always wondered -- just how many hits/how much bandwidth do you consume, and how many servers do you have to handle the load.
________________________________________________
suwain_2
Does google's policy of "ranking" the sites that have hits favor the "big guys" over more specific smaller traffic websites? That is, would a story on a site like CNN get a higher ranking in google on a keyword "Gulf War" than say a site (gulfwarveterans.com) that deals 100% with the Gulf War? Do you think you are leading to the commercialization of the web (i.e. the big power players) over smaller sites?
but I noticed a few months ago that Cisco now uses the Google engine to search the CCO. Congrats on that one. I've also noticed this new search box that Google is starting to produce. And it looks *very* cool. So my question is basically which is more important to your job the website or selling the service and the engine to people who need it?
Cypherpunks: Civil Liberty Through Complex Mathematics. Those who live by the sword die by the arrow.
Has there been any progress on the Pigeon Computing initiative?
I am the evil aardvark!
A little old but interesting.
g oo gle.
The Technology Behind Google 2000-10-19 (1hr 13min) By Jim Reese, Chief Operations Engineer, Google. How to build an internet search engine that indexes 1-2 terabytes of data 200 million web pages- and serves it up at a rate of 1000 requests/second. (Hint: Start with a farm of 10,000+ Linux servers). The technology behind Google: company overview, search parameters and results, hardware and query load balancing, Linux cluster topology, scalability, fault tolerance, and more. [420]
http://technetcast.ddj.com/tnc_search.html?key=
I am wondering why they chose Linux. Specifically, I wonder how they made the choice between all major OS-es (Linux, *BSD, Solaris and possibly Windows), as well as the software they use to power the site.
The Internet is always described as a distributed system with no single point of failure. Google, however, has quickly become by far the most popular method of locating information. "Surfing" has been killed with modern search technology, it's so much easier to look through Google than the Web itself. If Google was down, I'm sure the Internet would be far less useful.
Do you think Google has become an Internet point of failure? With the competition for larger and larger indexes, is the Internet becoming centralized? Do you think this is a bad thing?
What are you doing to prevent the new generation of more sophisticated search engine spammers- spammers that use advanced software such as WebPosition Pro, spammers that feed fake pages to the Google crawler, spammers that make bogus link pages to their own sites? Doesn't this new level of sophistication on their part mean that in large part Google must emphasize human website reviewers, such as those provided by the Open Directory Project, to a greater degree?
As a new network configuration guy, I am often stumped by a problem. I usually turn to google first, and my supervisor second. What has been the biggest problem that you have dealt with that will stand out in your mind years from now? As the "Head Techie", where did you turn, and what was the eventual resolution?
I'd rather you do it wrong, than for me to have to do it at all.
---- El diablo esta en mis pantalones! Mire, mire!
Does google plan on releasing more products like the Google Search Appliance in the near future - specifically those that are geared more towards the consumer level rather than business market? I would, personally, love to have some sort of google search engine on my machine to rummage through all the stuff I have. Does google plan on expanding into this market or will you remain focused on the web?
I know, I know, Only one question but - it begs to be asked - how well is your technology going to be able to scale? Considering the near-expotential growth of the internet will PageRank be able to keep up?
I understand that Google was using large numbers of IDE drives in lieu of more expensive but individually faster SCSI devices. What prompted the decision, and how have the concerns of reliability and performance been mitigated. What special technology, if any, was used to implement such a system
...as to what exactly Google does with the concepts it receives through the various Google-tech contests held. Have these ideas been made good use of? Do we see any of this in the Google we use every day? What about the ones that didn't win, do we see any of them?
Whats the google language of choice for web page building. I'd assume speed is the most important, so what language makes google so fast?
Good quote, too many chars. Seriously, the slashdot 120 char limit sucks!
Is there anything new that Google is working on that is not currently displayed in your labs section? If so could you explain it to us?
If you could sum it up in a nutshell, maybe you should be writing O'Reily books. --- Domasi 2001
Google's success has been well documented. Quick, relevant results are it's trademarks. Do you see any backlash against Google and what are you doing about people that use Google's success for their own purposes (ie Google bombing)?
I have a shitty sig!
Hi Craig. Google is my favorite search engine, mostly because it's so simple, fast, and has a very professional feel.
I wonder, when you're in charge of something as huge as Google, are you on call 24/7 in case something goes wrong? Have you ever been called during, say, a nice dinner, or worse, in the middle of the night? Thanks.
as Google got more popular and eventually reached the status it holds today, did you feel any pressure (either internally or from outside the organization) to switch from a Linux based cluster to a proprietary solution (Windows comes to mind, but there are others). Where you (or others at Google) affected by any of the FUD that is put out, and did it affect your perception of Linux as a viable solution?
... what linux distrubution does the world's largest server farm use?
It seems that Google's great successs is partly due to research coming out of the academic world. How many google employees have advanced degrees, and can they publish non-proprietry research after they join Google? How do you see the interplay between high-tech and Academia?
HOWTO get better dates on slashdot
What type of Database backend do you use and what led google to choose it?
Since sites like slashdot don't like to give out their statistics, I'd like to ask, what percent of users use what web browser? Also, what percent of users use what OS?
Good quote, too many chars. Seriously, the slashdot 120 char limit sucks!
Does Google use any natural language processing (when dealing with web pages, queries, etc.)? Are you planning on doing more with NLP in the future?
She sat at the window watching the evening invade the avenue.
Hypothetical, and hopefully fun, question.
Now assuming you had an infinite IT budget, generally which configuration of hardware/os (e.g. PC/Linux, PC/Win (boo hiss), IBM/AIX, VAX/VMS, Cray/Unicos, etc) would you adopt and why. More specifically, if pure performance were the only consideration, which would it be. Alternatively, if uptime were the primary consideration, which would it be.
Be honest and don't worry about the bias's of your audience.
How have these affected you and your job, and what are you feelings on this subject?
Xaotik Designs
What kind of bandwidth/pipes/networking setup do you use -- and how does your "macro" capacity diffuse down to each clustered server?
Basically, what's your setup and how does it work?
I have but one question... Who is the mastermind behind all the "special" logo changes that Google experiences throughout the year?
My hats off to that team!
-= Xafloc =-
alinuxbox.com
N
How has Google remained competitive?
Why in this day and age does google continue to penalize sites that are virtual hosted? With ip addresses becoming harder to get/justify every day why does google discount the relevance of links that don't come from a unique ip address. Please don't just deny it, I think the Internet community deserves an explanation.
Google recently ran it's "first annual programming contest," with a winner receiving $10,000. Many slashdotters suspect this was simply a way to recruit new talent. So, was finding new people one of the initial goals for this project, and have you hired any new programmers as a direct result of it? What were the other goals (PR, generation of new ideas, etc) where there?
Engineers arn't boring people, we just get excited about boring things.
It's well known that you use Linux in your mega clusters. I was wondering if you have ever been approached by Microsoft, Sun, or HP in an effort to switch to their proprietary OSes.
I can't imagine that you haven't. It must have been a huge decision to invest in one technology, so are you satisfied with what you have?
Moderation: Put your hand inside the puppet head!
Recently, the english division of our company [black and decker] hired 'HyperMedia Trafficing' or some other similar named company to get them 'more exposure' in the search engines.
.. or why no one bothered to ask me what to do.]
.. How does Google plan to make sure they are :
.. well .. pretty much anything.
[forget the ethical debate about that
What I want to know, is - going fowards - as more and more of these companies start up, and discover more and more unscrupulious ways of 'loading' the search engines with bogus hits/visits/data/etc.
1) Not loosing ad $$ to these folks
and
2) prefenting every search from returning something like www.hotgrannysex.com or www.top50.com as the 1st (or first 15) results for a search on
--Ne auderis delere orbem rigidum meum, non erravi pernicose!
No offense to Mr. Silverstein, but I'm much more interested in Cindy! Beautiful, highly successful nerds are terribly rare!
Just so I'm not off-topic:
Mr. Silverstein, how does Cindy look in tight sweaters?
Drool...
Talisman
"Study your math, kids. Key to the universe." -The Archangel Gabriel
One of the most impressive things about Google to me is how easily you seem to have embraced an open model. I realize the outward view of a company can be quite different from the internal view. How easy is it actually to make decisions such as opening API's. If it's easy can you give some advice on how one might convince their boss.
Thanks,
-Dave
"as plurdled gabbleblotchits on a lurgid bee" - Prostetnic Vogon Jeltz. (One man's humorous is another mans flamebait)
This shouldn't be modded down. The question just needs to be asked in a clearer way.
What job opportunities are there at Google, and what opportunities in the industry as a whole?
Sig (appended to the end of comments you post, 120 chars)
Hi Craig!
I think Google absolutely rocks. It has by far the most intelligent/helpful search engine results. Thanks for the great service.
Now onto the questions- what is the Google vision / strategy for the future? Where can Google go? From a search engine perspective, what are some of the challenges that you have and improvements that can be made (perhaps speeding up crawling to make the latest content available, for example)? How are you going about solving these challenges, and when can we expect them to be implemented?
On a similar note, I've noticed that recently Google announced a "google box" that allows for corporate to take advantage of the google search algorithms and indexing. Any more products like this being planned?
Is there any way we can find out what kind of suggestions Google receives from the public? It would be quite interesting to look at them all, and maybe some of them could now be implemented using the Google API.
-- Ed Avis ed@membled.com
I have a number of web servers, some Unix some Windows, and the number of attempted attacks each day from different IPs must run in to about one hundred. It is mostly people trying to execute commands or using malformed URLs trying to exploit some known past security hole. My question is, how many attempted attacks each day do the Google servers get?
Martin Piper
Owner - ReplicaNet and RNLobby
Google is a great free public resource. My concern is that it has to be expensive running a resource like that. I know Google's strategy is somewhat to use the free resource as a loss leader to promote your search technology, but the key word in "loss leader" is "loss". It's a great theory as long as you are able find people who want and need your search technology.
So my bottom line question is this: Does the web site pay for itself via the advertising? Is there a possibility that someday Google may decide the web site costs too much money to run if you get to a point where your reputation no longer needs the loss leader?
Sometimes it's best to just let stupid people be stupid.
It would be great if you did a documentary feature with TechTv or someone, because its one thing to read about your facility, but it would be another to see it.
Thanks for all of the help I've gotten from Google.com, I don't think I'd still be in schol without it.
Paradesign
PS, even just a photo feature on the site would be nice.
I want 2D games back.
Anyone who has ever needed a piece of information that was on a broken page will agree that the Google page cache is perhaps one of the most underrated and useful parts of your search engine.
There's one problem that everyone has with the cache, however - you don't deep-nest the caching, so that following any links on a cached page will lead to the original (probably broken) site, instead of to another cached page. Is there a technical or legal reason for why it works this way? Any chance we'll see deep caching at some point?
I expect that the California heat and thousands of boxen require special measures to prevent overheating. What kind of measures do you take for keeping your server farm operating normally at a cool temperature?
How do you avoid business pressures to make short-sighted solutions, and consistently make good, common sense ideas work instead of adopting ones from marketing sources? Not only does Google have the best search engine technology, but you consistently do the "right" thing. Clean, quick homepage, text only well-identified ads, interesting research projects, etc...This is the way many search engines start, but they all went the way of the "dark" side instead of adopting the "right" solution. In my jobs, it's been very difficult to execute and justify good engineering (or just common sense) under pressure from the people who control the money. Any advice for driving through well-thought-out decisions instead of adopting the "management fad of the month"?
Not to be too "X-File'ish", but does there come a point where too much knowledge is captured in Google? A point where anything that doesn't exist in Google doesn't exist, period? Wouldn't that represent a very tempting target for a bin Laden or a John Ashcroft, to try to control how the modern world thinks?
Kind of far out there, I know, but do you guys worry about this kind of thing?
sPh
Many sites, when referenced by Slashdot, crumble under the load. Can you folks see any difference, either to your "main" servers (www.google.com) or your cache servers?
Stupid job ads, weird spam, occasional insight at
Just curious when mod_google is going to be released for the apache webserver. It would be nice to have the power of Google indexing available to those of us without significant IT budgets (i.e. wife won't let me "buy another #$*@! computer").
How do you balance load among the www.google.com servers? Do you guide users to local servers (such as www.google.co.uk)?
Stupid job ads, weird spam, occasional insight at
What's the worst thing ever to happen to the google server farm? (Besides the pidgeons knawing on cables)
Tim Dorr
Owner/Manger
A Small Orange
How does google deal with denial of service attacks, particularly distributed ones?
The rest of us just suck it up with fat network pipes, but a high-profile target like google would be the holy grail of Internet vandals.
Has anyone ever poisoned your DNSes, effectively taking Google down even though the server are up? Successfully inserted bogus WAN routing info into the Internet, again effectively bringing down Google even though the servers are fine?
What's your worst cracker/net vandal story?
To what do you credit the popularity of google? Do you consider google a "success," or are you holding out for thousands of employees and billions in cash flow?
That's for May. Of course, it's all at the Zeitgeist, as linked in other responses. I don't blame you for not knowing about it, though; I've tried to find it from Google's web page, but couldn't until I searched for it (using Google, of course).
Personally I'm usually pretty drained after a fun day staring at the screen and typing like a monkey, and sometimes completely avoid the PC when I get home, prefering to chill with a decent book (currently Cradle to Cradle), zone-out in front of the TV, or go cycling in the beautiful Isle of Man (watch "Waking Ned Devine" for an idea of the scenery - jealous?<grin/>).
So I guess my completely-non-tech question is:
What do you do in "loafing" time (ie. loaf - To pass time at leisure; idle.), when you've left the office, "lost" the pager/Blackberry/PDA/mobile etc., and got away from it all?
Cheers,Do you expect widespread usage of RDF/DAML/OWL/TopicMaps for explicit meta-data annotation of web resources, or will it be used only in small circles of specialized content providers like academia, or maybe not at all?
How will Google react? Do you plan to use meta-data provided by web resources if found, and how will you decide if it isn't just made up to get people on some bogus pr0n site (like with those <meta>-Tags today)? Will it someday render the brute-force approach of full-text-indexing obsolete?
Programming can be fun again. Film at 11.
Have you ever considered setting up a distributed search engine client to expand your server farm through your users systems?
Donte Alistair Anderson Roberts - hi son!
Karma: Chameleon
Google has become such an important part of the Internet for millions of average users. With this in mind, my friends and I often joke about what would happen if (knock on wood) Google were to go out of business. I suggest that ICANN should do something useful for a change, and fund Google as an official, non-profit project for searching the net.
Although I have heard that Google turns a good profit, what exactly is preventing Google from becoming a not-for-profit organization? Couldn't Google take the extra income from licensing its search to create better search technologies and pay the employees, rather than make some shareholders rich? Wouldn't this perhaps make Google a more sustainable organization?
At least once a week now, I read someone who proclaims that "I no longer even use bookmarks or try typing in URLs. I just always go to Google for my information." Has anyone approached you (or have you considered yourself) producing a Web Browser which has no URL line, but instead has a Google line to automatically send anything typed there to Google as a gateway to the Internet? Seems like it would "sell" to the Google-holics.
You've got some incredibly cool peopleon your Technical Advisory Council. How often do you interact with them?
Stupid job ads, weird spam, occasional insight at
If I had mod points right now... Well, it's at 5 already.
It breaks my pluginses, my precious!
They have a nice graph, but no scale. I suppose you could do some careful pixel analysis of the graph to generate percentages, but it's a shame they don't list them.
Interestingly, I see "Other" has been steadily rising since it bottomed out in January, and has now surpassed Netscape 4. I would love to be able to click on that chart and see a detailed list of the percentages, and what "other" is composed of. Hopefully we'll see Mozilla get its own line on the graph soon.
It would also be nice to see a breakdown on a per-OS basis. I wonder how many people are running Internet Explorer on Linux? (Seriously, that would indicate what portion of non-IE users hack the browser tag to make web sites happy.)
Non-Linux Penguins ?
Google is an incredibly popular and effective website. I'm curious about the amount of pressure you have to expand in order to "stay competitive" or "aptly serve consumer's needs". Is there any kind of a push to go the way of yahoo or amazon and try and include EVERYTHING on that simple page? As things evolve, do you really see Google staying the top engine in 3 to 5 years?
indeed..
I've made some really stupid posts to the newgroups in the past and I used my real name. Can you delete them for me?
Evil ZEN Scientist
How do you guys manage thousands of servers spread throughout multiple datacenters?
How do you handle user accounts? Event notification?
Do you guys use "enterprise" software like Tivoli or Openview, or did you roll your own solution?
Conformity is the jailer of freedom and enemy of growth. -JFK
will you answer it?
Why do you have a neurosurgeons on staff?
What would it take to Slashdot Google? What do you do to avoid this? Have you been Slashdotted before, either from Slashdot itself or from some other link?
Carousel is a lie!
How can you possibly test bugfixes/changes that need to get deployed to thousands of machines? Furthermore, how in the heck do you deploy the changes once they're tested. I understand you probably can't describe the exact process, but perhaps you can enlighten us on some principals learned on the subject of CM on such a massive scale.
After the introduction of the Google API, some people, especially from the REST camp, criticized the the use of SOAP, claiming it just adds superflous bloat and is generally "unwebby". What do you think about this?
Programming can be fun again. Film at 11.
that comes to mind when I think of a huge server farm like Google's: can you give a rough order of magnitude (# of zeros maybe) on what your electric bill is?
Thanks very much for Google. The more I use it the more I appreciate it.
Everyone will ask about bandwidth, incoming lines, etc. (All the network capacity and capability stuff). Here's something a little more off the beaten track:
What technologies help to support the Google server farm? What kind of automated monitoring and trouble reporting tools are in use? Are they home brew, open-source, or COTS with some customization (scripts, etc)? And if you had to point to one area of network management and say "we could use some improvement or some better tools", what would that area be?
BTW - Google Rocks! I never use anything else anymore!
-- Mal: "Well they tell you: never hit a man with a closed fist. But it is, on occasion, hilarious."
They have already answered that one...
;)
Like everyone else here, kudos for a truly useful and fun utility.
Google seems to be a classic case of fast growth. What have you been doing to try to maintain Google's unique culture as you grow? In particular, as you add more services to Google and the interface becomes more complex, so too will your internal organization. Will a big Google become a Dilbert-like Google-plex?
Are you guys making enough money?
:)
I wish you'd give us some banner ads or something, I feel guilty. I don't want Google to go away.
Seriously, why don't you serve banner ads?
-Dan
There has been much debate about what the practical purpose for Google Voice search might be, could you fill us in? Is it really for use in cars?
In fact, there is an opposite concern. Whether through a network of links or through coordinated googlebombing [googlebombing.com], weblogs frequently show up near the top due to the nature of reciprocal linking between the blogs. Not saying that's good or bad (sometimes a sole voice is a better expert on a topic than CNN), but it is what it is. Ranking "links" seems valid enough, but then you ask if that includes machine-generated links by someone's aggregator and the issue becomes a little more cloudy.
Ahoy love the google, it's the only engine I trust these days. Nevertheless....
For a site where speed and information delivery are of the utmost importance, and archaic table-based design seems rather strange. Is there any reason you have yet switched to a more forwards-compatible xhtml/css design? (Note that by "design" I mean more the html and css than the visual appearance of it)
For my own amusement, I've been looking at recoding the google design in CSS, and it's really not that hard.
Thanks!
Karma: T-rexcellent.
Why can't I find a cache'd version of this page anywhere? if the live cache'd version goes down, and there's no cache'd cache'd version, whatever will we do? :)
dmarien
Why haven't you implemented yet the toolbar for open source browsers? Are there technical difficulties or rather lack of interest from Google?
Can we get some basic stats with the interview? I mean, we all know that Google gets a lot of traffic but how many hits per day/hour/minute? How big is the server farm? How much bandwidth are they eating? How about some other interesting stats? (I'm sure they have plenty!)
dbc
Can you tell us anything about how you are working
with the various intelligence agencies to provide
them information about seraches that are of interest to them?
Are you thinking of providing SSL access to your
web pages so that these agencies will have to work
with you instead just monitoring your network
traffic?
How/why in the world was a Google search in Klingon developed!?!
How does Google benchmark software? Eg how do you benchmark Apache, SQL, your CGIs etc...
Luck favors the prepared, darling.
From one interview...
Jason: What led to Google's decision to use Linux? When did that start?
Sergey: Well, Larry Page and I were in the Stanford PhD program in Computer Science. And we developed Google there. The way the computer science program worked is there was a hodgepodge of computer equipment lying around, and we would grab whatever scraps we could. We had all kinds of computers: HPs, Suns, Alphas and Intel's running Linux. So, we gained a lot of experience with all of those platforms.
When we started Google, we had to make the decision of what we wanted to use. Of course we chose Linux, because it is the most cost effective solution.
PCs are not only much cheaper these days, but we can also get them very quickly, because they're such a commodity item. That's an incredible benefit. We just installed another 1,000 computers and we got that done in a few weeks. That's really hard to do with any other kind of workstation. I think that's an advantage that people don't entirely realize.
Jason: Did you view it as being better, or was cost the main reason?
Sergey: It was better in some ways. Certainly for our purposes, we felt the support was better. For example, the actual kernel authors will respond to problems pretty quickly. They are especially responsive to Google nowadays, since we're so widely used. We can have a 15 minute turnaround. You can't really beat that for support.
That was an important factor, but frankly, the cost was a bigger issue. PCs are so cheap, which is very important. Sun's Solaris is probably more stable than Linux on PCs. It's hard to determine the blame, whether it's the hardware or the operating system. But, it's a minor difference.
Jason: Then, does all of your support come from newsgroups or do you actually pay for it through Red Hat?
Sergey: We have an operations team of about ten people, which helps a lot. And other than that we check newsgroups and e-mail the authors of the code. Usually, if it's a problem we can't figure out, we go straight to the authors.
Jason: Is Linux used on desktops at Google?
Sergey: It depends. Engineering mostly runs Linux. Business development/marketing runs Windows. Actually, I use Linux with VMWare running Windows. Some people have two computers, particularly some people in engineering who do UI development and need to test things out on Windows platforms. I find it better to just use VmWare and have one computer.
Jason: In a technical sense, what does Linux lack? What does it not provide?
Sergey: The 64-bit file system, which I know they are working on. It's slowly coming around. I think there are still occasionally some stability issues. I'm not saying Linux is unique in that respect, but you definitely want to have reliability. There are some issues dealing with higher memory systems. If you get to 2GB, and you try to push it past that, we encounter various problems. I know we've had some trouble with the network stack when we really push it hard. In terms of having lost most connections from lots of different machines.
And from another...
How is Linux used at the Google Projects? Why was Linux choose to improve Google search engine?
Sergey Brin: Actually, we currently run over 6,000 RedHat servers.
Linux is used everywhere...on the 6,000+ servers themselves, as well as desktop machines for all of our technical employees. We chose Linux because if offers us the price for performance ratio. It's so nice to be able to customize any part of the operating system that we like, at anytime. We have a large degree of in-house Linux expertise, too.
Most of our administrative tools were developed in-house, as well.
I don't remember what HTTPd they're running but it sure as hell isn't apache. Someone said that they get 1k hits per SECOND; what do you use to shape that insane amount of traffic? What is the '/search' page coded in? What databases are used to index a terabyte of data? How do those 10,000 nodes find the data they need to quickly? what sort of interlinks are used?
;)
How to you build a cluster like a war machine, in other words?
From what I've read about Google, it seems like the same server farm nodes spend time on both searching (crawling, indexing, and storage) and on queries (web searches from customers). Is this really the case? If so, is ad delivery another part of this system being carried out by all nodes? Put another way, how homogeneous are your server nodes: do they all do an equal share of searching, responding to web requests, and participating in the ad system? If server farm nodes are not as homogeneous as I'm thinking, then how are the different functional aspects of Google's service broken down -- crawling, indexing, storage, queries, ads, and any administrative services you need internally -- and how much of your resources are being thrown particularly at the ad serving aspects of your site? Do you have some machines focusing on ads, or is that folded in with search queries, and that in turn is folded in with the actual business of searching? No matter how it's broken down, I can imagine that it must be fiendishly complicated, and I'm continually impressed at how smoothly you manage to make it work.
From a business standpoint, how happy is Google with the ad strategy being used? Is it producing a significant portion of your revenues, or are you getting much more from the search services & hardware solutions you're providing to paying clients? How flexible is the current ad delivery system? I.e. if you're selling keyword matching to ad customers to a system distributed across thousands of servers, and promising those customers that they'll get, say, 100,000 page views, how is this work synchronized across the servers doing search queries? It seems like this could all quickly get in the way of the search services Google is really trying to offer, but it's hard to imagine if it would help more to do it all "inline" with the rest of the site (but possibly slowing everything down fractionally) or breaking it off into a separate system (but adding more internal network traffic, potentially making it harder to do up-to-date reporting, etc). More broadly, what tools are available to you internally for monitoring your overall quality of service? Do these systems co-exist with the rest of the site, or are they also broken away -- and again, if they are, how to you keep reporting information current enough to be useful?
Basically, I'm curious about the infrastructure, both from a technical and a business perspective. There have been a number of papers & articles over the past couple of years documenting how Google maintains it's server farm for delivering search services to users; I'd like to know more about what's going on at the back to keep that forward-facing system running so well.
Are there plans to index audio files (and the audio tracks of video files) so that these could be searched as well? I would guess that existing speech recognition packages could be reused for this purpose so that development would not be too complicated.
Recognizing text in images and videos and indexing that would be a similar task. I know that Google Catalog Search must be doing some OCR already, but I have no idea if this would take too many CPU cycles if applied to all images, or if there are other problems (the images themselves already get downloaded for the image search, so bandwidth should not be the problem).
I have google as my home page. I have the IE toolbar on my windows machine and the Galeon eqiv on my Linux box. In short - Google is normaly my first port of call.
Despite this I've always made a consious effort to keep a backup search engine that I try if I ever find myself using more than a "few" of the results from Google. I find that if there are, say, 10 good pages from google there will be at least 3 that AllTheWeb (for example) will find that Google didn't have in the top 50.
I've decided that its very dangerous to use one SE to the exclusion of all else because there will always be holes, index bias, algorithm oddities etc that hide some of the info you wanted. As long as your other SE isn't simply playing follow-the-leader you'll benefit from the little effort of using a second source.
The idea is old of course, any researcher will tell you that having one source is poor methodology.
Does Google recognise this idea and is there anything you plan to do about it?
A Super-advanced search that allowed you to alter the weightings of link importance, document age, the domain it sits on etc etc? That'd surely be too much work for the average Joe.
I don't have an answer or I'd be sat coding it up but I do find searches that simply work better elsewhere. Maybe if you had 3 algs with a 70%, 20%, 10% mix in the results we'd all be a little richer.
Discuss...
0daymeme.com: Great stuff.
Google's PageRank technology works very well on the web with lots of pages pointing to lots of other pages.
The Google Search Appliance, however, is targeted at an office environment. Most of the documents (especially the non-html ones) in the typical office stand alone and do not have links to each other.
How has Google modified or complimented (if at all) the PageRank algorithm to make it more suited to an office environment?
I am currently pushing management at my site to purchase a Google Search Appliance, so I need an answer to this to help justify the change from our existing search application. i.e. without a good PageRank score, how does the Search Appliance order the result set in a useful way?
Dealing in spamware is illegal in several U.S. states and European nations. By and large, spamware programs have no lawful use -- they are built to abuse open relays and proxies, fraudulently alter mail headers, and obfuscate spammed messages to make it harder for victims to track down the spammer. Spamware is not merely a "burglar's tool" useful for lawless action -- it is like a locksmithing kit specifically tailored to be excellent for burglary and no good for legitimate locksmithing, or a gun somehow built to be perfect for murder but nonfunctional for self defense.
Nevertheless, Google accepts ads for spamware -- as well as ads for other spamming services. Google today carries advertisements and thereby accepts sponsorship from dealers in network abuse. Given the real and present danger that spamming poses to the usefulness of the email facility, and the amount of time and money that today's Internet-using businesses and people spend defending themselves from this form of theft -- how can Google justify accepting this sponsorship?
First of all, thank you so much for providing the most useful site on the net.
I understand part of the success of Google has to do with the efficient use of open source/free software. How about in-house software development?. Do you folks develop open source software as a way of giving back to the community ?. What are your thoughts on free software ?
A big part of Google's strength is in the supported search syntax, most notably that you can search for phrases instead of just keywords, that you can filter OUT certain phrases or keywords, and that you can search for content on specific sites, or NOT on specific sites. The next step for me and probably a lot of other Unix/Perl types is regular expression support.
For example, let's say I'm looking for 80's brat pack member Anthony Michael Hall (not that I would do such a think), but I can't remember his middle name. Looking for "Anthony Hall" will do me little if any good, but looking for "Anthony \w+ Hall" could do the trick nicely.
Another example is that the user can provide their own limited fuzzy searching, by searching for optional prefixes and suffixes along with the root, instead of having to get the word or phrase exactly as it's indexed.
Thanks,
John
What is Google doing to keep itself on top? Do you think there is a lot of room for improvement? How do you think web searching can get better?
So I'm truly surprised no one has asked this one yet, as it's the first thing that popped into my head...
/. story gets posted and people go stampeding to google to find out more? Or is that happening right now? (I'd hate to think of myself as part of a huge herd of individually acting DDOS'ers, but unfortunately, that's about what it ends up being...)
The masses of Slashdotters have slashed and dotted many an unlucky website over the years...Pushing webservers to their limit and often breaking them outright...
With Google's Massive resources, Is there any noticeable difference when a
Sig currently under construction. Mind the gap....
sPh
Since Cyc has done a stint with Lycos, what about with the Google engine? Especially since Google maintains one of the largest and most relevant databases, a single question asked may result in huge amounts of additional, relevant information flowing into Cyc?
everyone asking about hardware and to be honest its not what makes google good
after all thewayback machine does kind of the same thing
its software
so this is my question
what browser do you use ?
regards
john '1.1alpha' jones
Click here or here.
Every major operating system has it's example of a major corporation which is perhaps the flagship company associated with the OS. e.g. Yahoo and FreeBSD, Earthlink/Ebay and Solaris, Adobe and OpenBSD, and Google and Linux.
So, having had to deal with Linux on a large scale, would you say that Linux was the right choice for what you are doing? With the benefit of heindsight, would you rather have gone with another operating system, such as FreeBSD, OpenBSD, Solaris, AIX, etc?
Slashdot gets worse every day... Pipedot: News for nerds, without the corporate slant
I think I speak for everyone when I say I'd like to see this Linux cluster Google is running. Just a Matrix-esque shot of the wall(s) of rackmount servers would be enough to make me happy.
--
The Bailiwick - DESIGNHUB2005
So is the Dept. of Justice(DOJ) or Home Land Security(HSL) talk to you? In partictular, are you(google.com) being asked to track certian 'security' queries(bombs, antrax, Command Taco, ect.) of your system?
You say things that offend me and I can deal with it. Can you?
vi or emacs?
Petru
78% of statistics are made up on the spot ;^)
I read recently that you cache many of the more popular pages every 15 minutes, which was a surprise. Exactly how many pages are counted in this "popular" set, how do you decide when to move a page from the normal every 28 day rotation to this one, and what's the process for getting one of these pages (say from my server) cached on yours, indexed, page ranked and available across your whole server farm for searching.
-- Chatbear - http://www.chatbear.com -- Free messageboards, Highly Customisable
While doing some scientific research I discovered that the the Google Seitgeist is a very interesting source of information for research in the area's of social and communication sciences (marketing, lifestyle, ...).
However, the available information and the explanation of used methodology is too limitid to make this information usefull scientifically.
This is a shame because the Seitgeist is just the tip of the iceberg. There must be an enormous amount of information available.
I know for sure that a few professors I know would have a field day if they were to be able to analyse all this data.
My question is: would it be possible to open all the available data to scientists for statistical analysis?
It doesn't even have to be free I think. Universities and research organisations pay a lot of money for survey's that result in datasets that are relativly small to the dataset available at Google.
I have heard that Google uses Python extensively to manage its data, grab new data, etc.
As an avid fan of the Python language, I am interested in exactly how Google puts it to use. Can you clue us in?
P.S. - Keep up the good work!
Your question netted 53,496 possible answers. I've filtered out similar answers, so let me just give you the first 1,000...
Says the RIAA: When you EQ, you're stealing bass!
We've all had servers crashing on us just before a deadline. We've all had to go to the office in the middle of the night to prevent a disaster. (we've all been hacked by a scipt-kid, once)
Do you have any stories of disasters or difficult moments in the datacenters that kept you all up for a few nights in a row, but went by unnoticed by the public?
What's the root password?
:)
Kickstart
Well...does he?
The Google image search is wonderful, but shouldn't that have been called Ogle?
Is there anything on the internet that you personally couldn't find with google and if so what was it?
p.s.
Thanks for all your help with my school research
How much is your data worth? Back it up now.
The Google toolbar is one of the coolest things about IE (maybe the only one <grin/>). However, you need a Windows system with IE in order to install and use it. Are there plans to have to toolbar available for Mozilla, and non-Microsoft systems in general?
She sat at the window watching the evening invade the avenue.
Can you talk a bit about how those weights have changed over time? Have there been any surprising shifts?
--- Jason Olshefsky
Karma: Poser (mostly affected by adding this line long after everyone else did)
Do you Yahoo?
;-)
has there been a noticeable increase in hit rate since google whacking became popular?
have you had to take counter measures?
also, you already have an elmer fudd language, how about duke nukem or yoda?
That man tried to kill mah Daddy
According to the Google site, in the career section one of the perks to working at Google is having a high powered linux workstation. My question is: How well does it work using linux as the Desktop/Workstation OS. What kind of compatibility issues have you run into when working with partners who use Windows. Additionally what kind of custom software soutions have been needed to make it work.
-ryan
Does this chain of thought keep you up at night?
Miko O'Sullivan
I noticed that at google has free gourmet lunches for all its employees couresy of Chef Charlie. My question is how good is the food and has Charlie told you any interesting stories from his days with the Grateful Dead?
I have been pondering this question for quite some time, and I think I finally found the one person who might give me the answer:
;-)
Dear Mr. Silverstein. If you could have everything, where would you put it?
As an addendum to this, what is it about the corporate culture at Google that makes it work so well while other "hip" dot coms went down the toilet? What's the magic ingredient that made Google turn out differently?
Got Rhinos?
Speaking of bookmarks, do you have any plans to offer a bookmarking service?
Miko O'Sullivan
How has Google managed to deal so well with ethical issues in the current economic environment? For instance, how has Google managed to avoid going to pop up ads, unfair treatment of search returns based on payoffs, and other challenges? Most search engines and news sites have already caved in to these methods citing budget shortages. Google, on the other hand, seems to be expanding existing services, and acquiring or developing new services. How has Google managed to avoid some of the other pitfalls like clueless corporate officers who push the company into adopting bad technologies or technologies that don't fit the company? How much input and control does the non-management, but highly technical types in the company exert over the corporate vision of technology?
Just what impact has Jakob Nielsen had on Google's interface?
icqqm [ICQ:11952102]
From various articles and references on the net, it's clear that Google uses a mix of languages in developing and deploying it's services. Languages I've seen cited are Python, Java, and C++. I assume this is not a complete list.
What programming languages do Google developers use, for which tasks are they used, and why?
The real Webmaven is user ID 27463. I don't rate an imposter, because my ID is such a lame-ass high number.
In particular:
1. Do you use any Topic Detection & Tracking techniques.
2. How do you cluster news stories? Do you use a Scatter/Gather approach.
3. Is the news site going to be available through an API?
Marcos
Your search results are undeniably the best available commercially on the web, but what thinking has been given to graphical information visualization? Some new search engines are trying out presenting results topologically. While these may not be very useful, there may be potential. What has google done technologically in this area if anything? Are there any plans to explore this avenue?
People who think they know everything really piss off those of us that actually do.
I used to kill hours watching the search requests scroll by on Metacrawler's Metaspy page back when people still used Metacrawler. Any chance we could have something like that on Google? I would *almost* even pay to subscribe to a site where I could watch uncensored Google search requests go by.
I was going to write:
"that could be cool if there would actually be some new content on that site"
But then I saw you just (finally?
yey! askadick rules.
With the success and popularity of Google, I find myself using URL's for places less and less and just entering names into Google to find places (they are almost always on the first page...) Do you think that you have almost replaced the URL?
Google does a good job of indexing html and has added new types of content over the last few months.
My question is what other contents types do you want/plan to index?
For example I searched Google for about an hour today looking for a solution to my PHP compilation problem with no luck. I turned to IRC and got an answer in a couple of minutes. If Google archived that conversation then the next person to search for the error message would get an exact hit.
/b
[Please type your sig here.]
Why don't you ask the Internet Oracle?
Living better through chemicals
I'm not very familiar with the Google toolbar, but IMHO, Google access from Mozilla 1.0 couldn't be much easier... just type your query in the address bar, then press up-arrow to select the 'Search Google for "fubar"' from the bottom of the drop-down menu, and hit Enter... presto! Google search in 2 keystrokes. Add Ctrl to that and you even get it in a new tab. Mozilla rocks! I just wish it had smooth auto-scroll, more customizable toolbars (such as small icons, optional text, Home on the nav bar instead the personal bar), and native support for Back/Forward buttons on mice, like IE and Opera 6 do.
"Mind, as manifested by the capacity to make choices, is to some extent present in every electron." -Freeman Dyson
Craig,
When will images.google.com include PNG images in its search base? Why were the image types limited to GIF and JPEG, when most browsers could also display PNG? Now, virtually all non-text browsers support Portable Network Graphics.
Questions done. I'll take this opportunity to thank Google for groups.google.com, the searchable usenet archive. In my opinion, 15% of the total value of the internet is contained therein. Excellent!
- - -
"The sixth sick shiek's sixth sheep's sick."
Can I have a prize for slashdotting google?
I was gunna mod... but I ahve a re-quest-ion.
Can you guys put up bandwidth graphs for the public to see. Like mrtg graphs page showing daily google request traffic flow. so we can see what type of overall trends in searching happens during the day.
I would love to be able to see just how massive you traffic is and what it looks like.
and let us know what tools you use to monitor all your stuff.
I work for a large ISP and am interested in how large systems are managed. How do you manage the Google? Do you use open source, commercial or roll-your-own monitoring. Do you use a SNMP agent and, if so to what extent. How fast can you detect a problem, troubleshoot it and fix it.
- Things are the way they are because they're coded that way -
...it's how you use it! =)
why do you think i keep replying?
The longer this thread becomes, the more people will see it!
how about that!
That's a good question and not "redundant". Let me expand on his/her question:
Is Google actively hiring? If so, what kind of job titles are most important for Google to fill these days?
[I'm not a headhunter]
Chris
Hey Google seems to have all the keys to success. So that would be like looking for a needle in a
When you were selecting the OS to run Google, why did you choose Linux? I'm partial to FreeBSD but I'm pretty sure that you evaluated it and found something a) that you didn't like or b) something about Linux that you liked better. If so, what?
Second part of this question: Do you continue to evaluate alternative operating systems?
Chris
Actually, economic inequity, and not lack of food, is the prevalent cause of hunger. Thus, in a way, making more people rich does create something wrong. Unless you consider it is ok that one sixth of the world's population faces chronic hunger. Like energy, there is only so much capital in the world...
I am saying ICANN (or someone like them) - not Venture Capitalists - should fund projects like Google.
The ultra lean user interface is a key factor for Googles success. Now that Google grows and more and more people are working there, how do you deal with the feature creep of "can we add this here and that there" ?
Some new features are great, but how do you draw the line to keep that lean interface ?
I was amazed to find out that the new site for Elizabeth Smart was crawled by Google (and ranked first) within only a few days from the kidnapping (and the site being registered). CNN pages are also routinely showing up in Google within hours of being published.
My question is: how do you estimate the rate of change for each site and how often do you crawl frequently updated sites (and update the index)? What is the range of re-crawl (few hours to one month?).
thanks a lot,
alex
PS: Congrats for the Webbys.
Any plans to release a Google Toolbar for Mozilla 1.0?
Google also tends to succeed where others have failed--take for instance Google Groups, formerly DejaNews. What motivated that purchase? Google also seemed very interested in augmenting the USENET archive with missing data, by hunting down CDs and other media that were published years before DejaNews started its archive--that seems like a genuine desire to preserve USENET for the ages, so what inspired that? Lastly as a corollary, Google Groups is missing one feature that DejaNews used to implement, but eliminated a year or so before it went down: Deja had been keeping archives of the text posts in alt.binaries.* groups, which can be valuable since many groups have active text discussion; will Google ever re-introduce the ability to search the text messages in the alt.binaries.* hierarchy that Deja used to offer, even if it's limited to the old archive Deja had?
Chasing Amy
(We all chase Amy...)
"The more corrupt the state, the more numerous the laws"-Tacitus
How many physical sites do you use to host your systems? And is this due to network redundancy issues, disaster management issues, or simply realestate issues? If they're all in one site, is it because you feel things are easier to manage that way, or is it a limitation some crazy developer didn't think of?
What I'm getting at is I'd love to work for google, and actually like some of the current job postings, but I don't want to move to California. (don't get me started on the reasons) If Gogle had sites in other locations, wouldn't it make sense to hire local admins to go deal with situations there? And thus the concept of the Google branch office is born...
"We are not tolerant people. We prefer drastically effective solutions"
Google, along with other search engines, filters content for explicit content. Is there any other type of content that Google has considered, or currently filters as a matter of practice, and what led to this decision?
Could you tell us a little about the back end of your search eangine. It's extremely fast. What database software do you use? What optimisations have you implemented at the Operating System level (cluster sizes, Raw IO...)? What type of hardware do you use (Disk drives, Raid, CPU(s)...)? How do you handle load balancing/redundancy?
A/S/L?
I'm interested to know whether Google has ever been sucessfully cracked? (errrm, perhaps this is sensitive info)
How often do you detect cracking attempts?
If this occurs what do you do about it?
Thanks
* * Always question "the National Interest" - 9 times out of 10 it is a cover for evil
You forgot the most important question.
Are there any plans to make a profit? Yeh I know paying divendends are unpopular with tax accounts, but VCs eventually do eventually say no. So I assume one day Google will have to make enough money to at least pay for its self.
Afterall logging onto Google doesn't seem stimulate half a dozen porn 'n casino pages popping up, so that arn't getting their fractions of a cent that way.
How does Google do so much so smoothly?
How about giving a server?/Rackmount thing to the Wayback machine, so we can use the power of Google to navagate into the past! (their search sucks) Come to think of it, so does /.'s!
maybe this isn't the right guy to ask this question, but i've always wondered, was something like AdWords planned from the start or did you guys have to throw them up in a hurry because you found yourself low on cash?
also, are there more plans for parts of Google like this that are strictly to make money or is most of the new coding focused on projects like what can be found on the labs page?
is there ever any tension deciding what to focus on, things that make money or things that are useful or cool? do the engineers have any input or are these all management decisions?
And yes, IE6 is WinXP, IE5 is Win98 SE, IE4 is Win98. I think Win2k shipped with IE5 and WinME shipped with 5.5 but I could be mistaken.
Bleh!
What and When are you planning to to do about the frauds being perpetuated on Google Answers (http://answers.google.com) with questioners paying their own alias instead of the expert poster who deserves to be paid. They are destroying the cedibility of what has the potential to be an absolutely brilliant service.
I think you need to be a much more strigent about suspending and banning these abusers if you wish this service to take off.
I don't think so. There are easy ways to turn off styling for NS4, leaving the page usable but not so pretty. Anyway, I think it's still possible to make it look good in ns4.
NS4 doesn't deserve any love from us web creators anyway....
Karma: T-rexcellent.
There is the Google Answers Service (answers.google.com) only problem is it riddled with frauds.
There has been discussions about the legality of caching material that is copyrighted, do you have any thoughts of this?
Have you considered any technical solutions to this problem?
eg: a cache.txt file that tells robots if their allowed to cache a site as opposed to robots.txt that tells if they're allowed to index the site. Or perhaps the other way around that people can have a cache.txt that tells robots that they are allowed to cache a site. This would let people opt-in instead of opt-out and would also eliminate a lot of unnessesary pages. (I wouldn't mind if you cached my homepage but I won't expect anyone to be interested in the cached copy)
- We are the slashdot. Resistance is futile. Prepare to be moderated -
This begs the question... how do they cope with hardware failures? Even using the wildly exaggerated MTBF figures published by manufacturers, that's a significant number of failures *every* day. Does Google have dedicated hardware techs running round replacing broken drives, fried memory and faulty power supplies?
"The invisible and the non-existent look very much alike." -- Delos B. McKown
http://science.slashdot.org/comments.pl?sid=34509& cid=3742297
This deserves more than a 3. PNG really is the lossless image format of the future, and needs adoption on as many fronts as possible.
Google's image archive is fantastic, but it seems to only archive PNG and JPG files. Given Google's general trailblazing attitude on algorithms and technology, it would seem appropriate that more media would make it into the archive than just these legacy formats, like PNG files.
Google is crawling Word docs and PDF files nowadays...what other media types are in store for the future?