What's It Like to be Google's Boss Techie?
We'd like to welcome Google Director of
Technology Craig Silverstein as our next Slashdot
interview victim... err... guest. You think you
run a big Linux server farm? Craig's is bigger.
Think your Web site gets a lot of traffic and
creates a lot of headaches? Just think what Craig
must face! Post whatever you'd like to ask Craig
below, one question per post. About 24 hours after
this runs we'll email Craig 10 of the
highest-moderated questions, and we'll post his
answers shortly after he gets them back to us.
Google always seem to be early-to-market with some really highly developed software solutions, and also always seems to have the backbone to support them.
I'm curious -- what drives the innovation? Is it the hardware team advancing architecture to permit the software team more room to play, or is it the software team saying, "Hey, look what we got!" and the hardware team dropping the iron to implement it?
I understand there must be some level of synergy, but is it completely seamless or is one side of the equation effectively driving the other?
Leem
Does google's policy of "ranking" the sites that have hits favor the "big guys" over more specific smaller traffic websites? That is, would a story on a site like CNN get a higher ranking in google on a keyword "Gulf War" than say a site (gulfwarveterans.com) that deals 100% with the Gulf War? Do you think you are leading to the commercialization of the web (i.e. the big power players) over smaller sites?
but I noticed a few months ago that Cisco now uses the Google engine to search the CCO. Congrats on that one. I've also noticed this new search box that Google is starting to produce. And it looks *very* cool. So my question is basically which is more important to your job the website or selling the service and the engine to people who need it?
Cypherpunks: Civil Liberty Through Complex Mathematics. Those who live by the sword die by the arrow.
I am wondering why they chose Linux. Specifically, I wonder how they made the choice between all major OS-es (Linux, *BSD, Solaris and possibly Windows), as well as the software they use to power the site.
The Internet is always described as a distributed system with no single point of failure. Google, however, has quickly become by far the most popular method of locating information. "Surfing" has been killed with modern search technology, it's so much easier to look through Google than the Web itself. If Google was down, I'm sure the Internet would be far less useful.
Do you think Google has become an Internet point of failure? With the competition for larger and larger indexes, is the Internet becoming centralized? Do you think this is a bad thing?
What are you doing to prevent the new generation of more sophisticated search engine spammers- spammers that use advanced software such as WebPosition Pro, spammers that feed fake pages to the Google crawler, spammers that make bogus link pages to their own sites? Doesn't this new level of sophistication on their part mean that in large part Google must emphasize human website reviewers, such as those provided by the Open Directory Project, to a greater degree?
As a new network configuration guy, I am often stumped by a problem. I usually turn to google first, and my supervisor second. What has been the biggest problem that you have dealt with that will stand out in your mind years from now? As the "Head Techie", where did you turn, and what was the eventual resolution?
I'd rather you do it wrong, than for me to have to do it at all.
I understand that Google was using large numbers of IDE drives in lieu of more expensive but individually faster SCSI devices. What prompted the decision, and how have the concerns of reliability and performance been mitigated. What special technology, if any, was used to implement such a system
...as to what exactly Google does with the concepts it receives through the various Google-tech contests held. Have these ideas been made good use of? Do we see any of this in the Google we use every day? What about the ones that didn't win, do we see any of them?
Is there anything new that Google is working on that is not currently displayed in your labs section? If so could you explain it to us?
If you could sum it up in a nutshell, maybe you should be writing O'Reily books. --- Domasi 2001
as Google got more popular and eventually reached the status it holds today, did you feel any pressure (either internally or from outside the organization) to switch from a Linux based cluster to a proprietary solution (Windows comes to mind, but there are others). Where you (or others at Google) affected by any of the FUD that is put out, and did it affect your perception of Linux as a viable solution?
Does Google use any natural language processing (when dealing with web pages, queries, etc.)? Are you planning on doing more with NLP in the future?
She sat at the window watching the evening invade the avenue.
How have these affected you and your job, and what are you feelings on this subject?
Xaotik Designs
Why in this day and age does google continue to penalize sites that are virtual hosted? With ip addresses becoming harder to get/justify every day why does google discount the relevance of links that don't come from a unique ip address. Please don't just deny it, I think the Internet community deserves an explanation.
Google recently ran it's "first annual programming contest," with a winner receiving $10,000. Many slashdotters suspect this was simply a way to recruit new talent. So, was finding new people one of the initial goals for this project, and have you hired any new programmers as a direct result of it? What were the other goals (PR, generation of new ideas, etc) where there?
Engineers arn't boring people, we just get excited about boring things.
It's well known that you use Linux in your mega clusters. I was wondering if you have ever been approached by Microsoft, Sun, or HP in an effort to switch to their proprietary OSes.
I can't imagine that you haven't. It must have been a huge decision to invest in one technology, so are you satisfied with what you have?
Moderation: Put your hand inside the puppet head!
Recently, the english division of our company [black and decker] hired 'HyperMedia Trafficing' or some other similar named company to get them 'more exposure' in the search engines.
.. or why no one bothered to ask me what to do.]
.. How does Google plan to make sure they are :
.. well .. pretty much anything.
[forget the ethical debate about that
What I want to know, is - going fowards - as more and more of these companies start up, and discover more and more unscrupulious ways of 'loading' the search engines with bogus hits/visits/data/etc.
1) Not loosing ad $$ to these folks
and
2) prefenting every search from returning something like www.hotgrannysex.com or www.top50.com as the 1st (or first 15) results for a search on
--Ne auderis delere orbem rigidum meum, non erravi pernicose!
Hi Craig!
I think Google absolutely rocks. It has by far the most intelligent/helpful search engine results. Thanks for the great service.
Now onto the questions- what is the Google vision / strategy for the future? Where can Google go? From a search engine perspective, what are some of the challenges that you have and improvements that can be made (perhaps speeding up crawling to make the latest content available, for example)? How are you going about solving these challenges, and when can we expect them to be implemented?
On a similar note, I've noticed that recently Google announced a "google box" that allows for corporate to take advantage of the google search algorithms and indexing. Any more products like this being planned?
Google is a great free public resource. My concern is that it has to be expensive running a resource like that. I know Google's strategy is somewhat to use the free resource as a loss leader to promote your search technology, but the key word in "loss leader" is "loss". It's a great theory as long as you are able find people who want and need your search technology.
So my bottom line question is this: Does the web site pay for itself via the advertising? Is there a possibility that someday Google may decide the web site costs too much money to run if you get to a point where your reputation no longer needs the loss leader?
Sometimes it's best to just let stupid people be stupid.
It would be great if you did a documentary feature with TechTv or someone, because its one thing to read about your facility, but it would be another to see it.
Thanks for all of the help I've gotten from Google.com, I don't think I'd still be in schol without it.
Paradesign
PS, even just a photo feature on the site would be nice.
I want 2D games back.
Anyone who has ever needed a piece of information that was on a broken page will agree that the Google page cache is perhaps one of the most underrated and useful parts of your search engine.
There's one problem that everyone has with the cache, however - you don't deep-nest the caching, so that following any links on a cached page will lead to the original (probably broken) site, instead of to another cached page. Is there a technical or legal reason for why it works this way? Any chance we'll see deep caching at some point?
How do you avoid business pressures to make short-sighted solutions, and consistently make good, common sense ideas work instead of adopting ones from marketing sources? Not only does Google have the best search engine technology, but you consistently do the "right" thing. Clean, quick homepage, text only well-identified ads, interesting research projects, etc...This is the way many search engines start, but they all went the way of the "dark" side instead of adopting the "right" solution. In my jobs, it's been very difficult to execute and justify good engineering (or just common sense) under pressure from the people who control the money. Any advice for driving through well-thought-out decisions instead of adopting the "management fad of the month"?
Not to be too "X-File'ish", but does there come a point where too much knowledge is captured in Google? A point where anything that doesn't exist in Google doesn't exist, period? Wouldn't that represent a very tempting target for a bin Laden or a John Ashcroft, to try to control how the modern world thinks?
Kind of far out there, I know, but do you guys worry about this kind of thing?
sPh
Many sites, when referenced by Slashdot, crumble under the load. Can you folks see any difference, either to your "main" servers (www.google.com) or your cache servers?
Stupid job ads, weird spam, occasional insight at
Just curious when mod_google is going to be released for the apache webserver. It would be nice to have the power of Google indexing available to those of us without significant IT budgets (i.e. wife won't let me "buy another #$*@! computer").
What's the worst thing ever to happen to the google server farm? (Besides the pidgeons knawing on cables)
Tim Dorr
Owner/Manger
A Small Orange
How does google deal with denial of service attacks, particularly distributed ones?
The rest of us just suck it up with fat network pipes, but a high-profile target like google would be the holy grail of Internet vandals.
Has anyone ever poisoned your DNSes, effectively taking Google down even though the server are up? Successfully inserted bogus WAN routing info into the Internet, again effectively bringing down Google even though the servers are fine?
What's your worst cracker/net vandal story?
Do you expect widespread usage of RDF/DAML/OWL/TopicMaps for explicit meta-data annotation of web resources, or will it be used only in small circles of specialized content providers like academia, or maybe not at all?
How will Google react? Do you plan to use meta-data provided by web resources if found, and how will you decide if it isn't just made up to get people on some bogus pr0n site (like with those <meta>-Tags today)? Will it someday render the brute-force approach of full-text-indexing obsolete?
Programming can be fun again. Film at 11.
Google has become such an important part of the Internet for millions of average users. With this in mind, my friends and I often joke about what would happen if (knock on wood) Google were to go out of business. I suggest that ICANN should do something useful for a change, and fund Google as an official, non-profit project for searching the net.
Although I have heard that Google turns a good profit, what exactly is preventing Google from becoming a not-for-profit organization? Couldn't Google take the extra income from licensing its search to create better search technologies and pay the employees, rather than make some shareholders rich? Wouldn't this perhaps make Google a more sustainable organization?
Non-Linux Penguins ?
Google is an incredibly popular and effective website. I'm curious about the amount of pressure you have to expand in order to "stay competitive" or "aptly serve consumer's needs". Is there any kind of a push to go the way of yahoo or amazon and try and include EVERYTHING on that simple page? As things evolve, do you really see Google staying the top engine in 3 to 5 years?
indeed..
I've made some really stupid posts to the newgroups in the past and I used my real name. Can you delete them for me?
What would it take to Slashdot Google? What do you do to avoid this? Have you been Slashdotted before, either from Slashdot itself or from some other link?
Carousel is a lie!
How can you possibly test bugfixes/changes that need to get deployed to thousands of machines? Furthermore, how in the heck do you deploy the changes once they're tested. I understand you probably can't describe the exact process, but perhaps you can enlighten us on some principals learned on the subject of CM on such a massive scale.
Everyone will ask about bandwidth, incoming lines, etc. (All the network capacity and capability stuff). Here's something a little more off the beaten track:
What technologies help to support the Google server farm? What kind of automated monitoring and trouble reporting tools are in use? Are they home brew, open-source, or COTS with some customization (scripts, etc)? And if you had to point to one area of network management and say "we could use some improvement or some better tools", what would that area be?
BTW - Google Rocks! I never use anything else anymore!
-- Mal: "Well they tell you: never hit a man with a closed fist. But it is, on occasion, hilarious."
There has been much debate about what the practical purpose for Google Voice search might be, could you fill us in? Is it really for use in cars?
Check out this to get most of your answers. Shouldn't we be asking him stuff that isn't sitting on their website?
I don't remember what HTTPd they're running but it sure as hell isn't apache. Someone said that they get 1k hits per SECOND; what do you use to shape that insane amount of traffic? What is the '/search' page coded in? What databases are used to index a terabyte of data? How do those 10,000 nodes find the data they need to quickly? what sort of interlinks are used?
;)
How to you build a cluster like a war machine, in other words?
Are there plans to index audio files (and the audio tracks of video files) so that these could be searched as well? I would guess that existing speech recognition packages could be reused for this purpose so that development would not be too complicated.
Recognizing text in images and videos and indexing that would be a similar task. I know that Google Catalog Search must be doing some OCR already, but I have no idea if this would take too many CPU cycles if applied to all images, or if there are other problems (the images themselves already get downloaded for the image search, so bandwidth should not be the problem).
What is Google doing to keep itself on top? Do you think there is a lot of room for improvement? How do you think web searching can get better?
I have heard that Google uses Python extensively to manage its data, grab new data, etc.
As an avid fan of the Python language, I am interested in exactly how Google puts it to use. Can you clue us in?
P.S. - Keep up the good work!
Can you talk a bit about how those weights have changed over time? Have there been any surprising shifts?
--- Jason Olshefsky
Karma: Poser (mostly affected by adding this line long after everyone else did)
Does this chain of thought keep you up at night?
Miko O'Sullivan
With the success and popularity of Google, I find myself using URL's for places less and less and just entering names into Google to find places (they are almost always on the first page...) Do you think that you have almost replaced the URL?