Craig Silverstein answers your Google questions
1) I've wondered
by lblack
Google always seem to be early-to-market with some really highly developed software solutions, and also always seems to have the backbone to support them.
I'm curious -- what drives the innovation? Is it the hardware team advancing architecture to permit the software team more room to play, or is it the software team saying, "Hey, look what we got!" and the hardware team dropping the iron to implement it?
I understand there must be some level of synergy, but is it completely seamless or is one side of the equation effectively driving the other?
Craig:
Actually, the innovation is driven neither by hardware or software, but by products. We look around and say, "What would be the next great product to have?" and then figure out what software and hardware we need to make that product work and work great. If that figuring goes along the lines of, "Oh, it shouldn't take more than two weeks to get the code ready for public use, so that should give us plenty of time to get the 2000 new machines we'll need ordered, delivered, and installed" -- well, that's the kind of environment in which innovation flourishes at Google. [:-)]
2) Network Management Tools/Technologies
by kaladorn
What technologies help to support the Google server farm? What kind of automated monitoring and trouble reporting tools are in use? Are they home brew, open-source, or COTS with some customization (scripts, etc)? And if you had to point to one area of network management and say "we could use some improvement or some better tools", what would that area be?
Craig:
Almost all the technology we use to support our server farm is home-grown. The system we've built is so efficient we can maintain more than 10,000 computers with a handful of ops folks.
Of course, we benefit a lot from our massive redundancy: Unlike many companies, we don't need to worry immediately if a computer, or two, or a hundred, die, because the dead computers have lots of clones.
The biggest issue when you have more than 10,000 computers is that network management tools based on visualization become inadequate to the task: even if the UI is very good, there's often too much going on (ie,going wrong) to work effectively. At this level, you really benefit from tools that can not only identify problems but fix them. Of course, it's hard to write general tools for this, since "fixing problems" is typically pretty application-specific.
3) As a market leader...
by Marx_Mrvelous
It's well known that you use Linux in your mega clusters. I was wondering if you have ever been approached by Microsoft, Sun, or HP in an effort to switch to their proprietary OSes.
I can't imagine that you haven't. It must have been a huge decision to invest in one technology, so are you satisfied with what you have?
Craig:
We have been approached by several vendors. However, the advantages of Linux for us are pretty strong: It's an environment our developers tend to be familiar with, it offers unsurpassed tech support (we usually talk directly to the author of a piece of code when we're having problems with it), and it's cheap -- an important consideration when you have over 10,000 computers.
I think Linux works here as well as it does because of our technology culture. Our engineers feel comfortable being a partner in debugging kernel problems. For companies that would like to be able to give bug reports like, "Our network is slow" and have someone else take things over from there, Linux probably is not yet the ideal choice.
There's also a question of "Why Linux rather than FreeBSD?" or another free unix-like OS. We're not really religious about this issue. We used Linux -- as well as other, proprietary Unix variants -- when still at Stanford and were happy with it. My guess is if we had used a different open-source, unix-like operating system, we would have been happy with that as well. We're pretty pragmatic about using what works well for us.
4) Google's inescapable coolness.
by rob_from_ca
How do you avoid business pressures to make short-sighted solutions, and consistently make good, common sense ideas work instead of adopting ones from marketing sources? Not only does Google have the best search engine technology, but you consistently do the "right" thing. Clean, quick homepage, text only well-identified ads, interesting research projects, etc...This is the way many search engines start, but they all went the way of the "dark" side instead of adopting the "right" solution. In my jobs, it's been very difficult to execute and justify good engineering (or just common sense) under pressure from the people who control the money. Any advice for driving through well-thought-out decisions instead of adopting the "management fad of the month"?
Craig:
You know, it's this kind of cruel, hard-hitting question that gives the press a bad name. But, rob_from_ca, I know you're not really a member of the press corps -- are you? -- so I'll let it slide.
I think you're right that it's easy for a company to start with a laser-like focus on user experience, but hard to keep it up as the company grows.
I think there are two important factors that have helped Google keep its focus on users. One is that the founders have stayed actively involved in the company. The basics of our company flows directly from them. Larry Page's background is in user interfaces, and that really shows in the design of the site and in every project we do. And both Larry and Sergey Brin firmly believe that if we concentrate on users, everything else -- including money -- will follow.
The other important decision, which I can't stress enough, has been hiring. We've hired people who not only agree with this user-centric view of the world, but embrace it. Knowing what I know now, I'm infinitely impressed by how much our VP of Worldwide Sales and Field Operations, Omid Kordestani, embraced Google's policy of eschewing banner ads in favor of text-based ads, using an advertising system we developed ourselves. It's paid off, but three years ago it was far from a sure thing.
5) Google and IP address.
by Anonymous Coward
Why in this day and age does google continue to penalize sites that are virtual hosted? With ip addresses becoming harder to get/justify every day why does google discount the relevance of links that don't come from a unique ip address. Please don't just deny it, I think the Internet community deserves an explanation.
Craig:
I can't just deny it? What are my other choices? [:)] Actually, Google handles virtually hosted domains and their links just the same as domains on unique IP addresses. If your ISP does virtual hosting correctly, you'll never see a difference between the two cases. We do see a small percentage of ISPs every month that misconfigure their virtual hosting, which might account for this persistent misperception--thanks for giving me the chance to dispel a myth!
6) Weighting of heuristics
by jolshefsky
As the web develops, methods of matching a set of search keywords to a set of websites related to those keywords must change with it. I envision that the Google algorithms rank search hits by summing weighted factors such as overall site popularity, META tag keywords, META tag descriptions, TITLE tag contents, text contents, keywords containted in URLs, and so on.
Can you talk a bit about how those weights have changed over time? Have there been any surprising shifts?
Craig:
a) I'm afraid not, and b) No comment.
7) Regression
by Have Blue
The Internet is always described as a distributed system with no single point of failure. Google, however, has quickly become by far the most popular method of locating information. "Surfing" has been killed with modern search technology, it's so much easier to look through Google than the Web itself. If Google was down, I'm sure the Internet would be far less useful.
Do you think Google has become an Internet point of failure? With the competition for larger and larger indexes, is the Internet becoming centralized? Do you think this is a bad thing?
Craig:
It's true the Internet is distributed, but Internet services have never been. We saw that really vividly a few years ago when Network Solutions had a screwup with their root nameservers. As I recall, the Internet was basically unusable until DNS got fixed up again.
I think the growth of search engines is a sign that, in fact, the internet (well, the web in this case), is not becoming more centralized. If it were, then people could use a centralized registry to find whatever they needed to know. As it is, information is spread out throughout the web, so only an index like Google can tie it all together.
8) Favoring Big Guys
by PenguinRadio
Does google's policy of "ranking" the sites that have hits favor the "big guys" over more specific smaller traffic websites? That is, would a story on a site like CNN get a higher ranking in google on a keyword "Gulf War" than say a site (gulfwarveterans.com) that deals 100% with the Gulf War? Do you think you are leading to the commercialization of the web (i.e. the big power players) over smaller sites?
Craig:
Hmm, everything I wanted to say here has already been said in the Slashdot discussion on this question.
But in my own words: Google doesn't actually use traffic ("hit") analysis in its rankings: the rankings are based entirely on how sites link to each other. One consequence of this approach is that sites like gulfwarveterans.com, which maintain a consistent focus on one issue, are more likely to accrue lots of links than a transient news story, even one on a major site.
Indeed, searching for "gulf war" on google turns up two Gulf War veterans sites in the top 5, including gulfwarvets.com.
9) Dot com changes?
by Telastyn
Last I heard Google was still the stereotypical "startup" type company; promoting morale over bureaucracy as long as the work got done. Hockey, pool, the Greatful Dead's ex-chef (iirc?), and tons of other perks.
Did google keep the atmosphere as you've grown? did they keep it while others tanked?
Craig:
We still pay a lot of attention to making Google a place people like spending their time. The latest is a massaging chair we imported from Japan, so people could get massages even when it's not our masseuses' regular working hours (and they use the chair, too!).
We set this up from the beginning. (Healthy Choice granola bars in the breakroom: that was Sergey. All the M&M's you could eat: that was me.) We still see advantages to it. We think these efforts help productivity rather than hurt it. When you and a co-worker discuss an idea in a conference room, that pretty much limits the communication potential to just you two; when you discuss it over a game of pool, soon half the company has wandered by and had the opportunity to comment.
10) Google's first programming contest
by PK_ERTW
Google recently ran it's "first annual programming contest," with a winner receiving $10,000. Many slashdotters suspect this was simply a way to recruit new talent. So, was finding new people one of the initial goals for this project, and have you hired any new programmers as a direct result of it? What were the other goals (PR, generation of new ideas, etc) where there?
Craig:
The main goal was to have fun and to get people thinking about what they can do with large quantities of information. If we got people excited about the field of search or data mining -- even if they never submitted a program to us -- then that entire area of research benefits, and ultimately Google benefits as well.
The fact that our terms and conditions mentioned that we retained unexclusive rights to whatever people submitted, hints at our attitude. If really good ideas came out of the program, we wanted to be able to use them. On the other hand, we weren't using the contest as a substitute for consulting or anything (or else we would have demanded exclusive rights). And if the authors of the good programs wanted to come work for us, so much the better. For people who were excited by this project, we already knew there was a cultural fit.
[The following question was added to "the list" by Craig -- ED]
11) Forget Craig
by Talisman
No offense to Mr. Silverstein, but I'm much more interested in Cindy [McCaffrey]! Beautiful, highly successful nerds are terribly rare!
Just so I'm not off-topic: Mr. Silverstein, how does Cindy look in tight sweaters?
Craig:
If you did any research at all, you know that Cindy is our Vice President of Corporate Communications. As such, she takes an active role in Google interviews, such as this one. In fact, she's looking over my shoulder even as I type this. And, let me just say, she ... Hey Cindy, what are you doing? No, don't press that button! Hey! erwqu8poxasewrvNO CARRIER
RTFI. Read the fscking interview.
[The following question was added to "the list" by Craig -- ED]
The interviewee added that question, not some sexually repressed teenage nerd. Oh, and it was humour. Yep.
Check her out here.
there is an interview with craig available as mp3 (over 70 minutes) that deals with details of the technology at google and how it changed since mr. silverstein started at google.
You mean like this?
So who is Cindy McCaffrey? Google knows! Super cool. Big thanks to the guys at Google for the search engine and the interview :)
JOhn
Campaign for Liberty
Well, let's see. Assuming 100 watts continuous power consumption per server and an electricity rate of $0.15 per kilowatt-hour, we have:
0.1 kW * 10,000 servers * (365 * 24) hours * $0.15 per kW-h
which is $1,314,000 per year, just to run their server farm.
It didn't get modded up, that's the one that he picked out to answer.
Let's see here. Using rudimentary figures we can compute the cost of electricity for those computers alone. Let's assume that the average computer draws 3.5 amps with 120 volts being supplied to it.
.00346
From Kirchoff's law (either him or Ohm, I forget which) we get Power (watts) = I (amps) * V (volts)
therefore
P = 3.5amps * 120 volts * 10,000 computers
= 4,200,000 watts (egads!)
Mind you, this is per second! Let's convert this to an hourly figure.
Watts = 60 * 60 * 4,200,000
= 15,120,000,000
Let's convert this to kilowats for simplicity
= 15,120,000 kilowatts
Now, let's say that the power company charges $0.00346 (number pulled from thin air) per killowatt hour. Figure the computers are running 24 hours a day.
Cost per day = 15,120,000 * 24 *
= $1,255,564.8
Something tells me that my figures had better be damn well off, or else there is some serious cash floating around there!
thought I'd use my favorite image search engine to find this picture... then I realized my favorite image search engine is GOOGLE.
Krispy Cream is people
I'll say one headless machine is drawing in the neighborhood of 200 watts. That's .2 KW. Per day, that's 4.8 KW-hours. One KW-hour costs neighborhood of a nickel, so one machine costs about 25 cents to run each day. 10,000 machines cost $2500 in electricity a day altogether.
You made a number of mistakes. The main one is: there is no notion of "watts per second" unless you are talking about a rate of change of power. 4.2M watts is a rate; it is the same whether measured over an instant, a second or an hour. If you use power at that rate for an hour you use 4.2M watt-hours. Not 15.1 billion watt-hours. So that was a factor of 3600 on the high side. Then you were off by a factor of about 14 on the low side in pricing kilowatt-hours. And I would say you were about a factor of 2 on the high side in wattage of a server. Altogether you were high by a factor of just about 500x which is the difference in our results.
I want the site to rank well in Google for the topics and products it covers, so how do I avoid screwing up with virtual hosting?
1. Virtual host by hostname, not by path. GeoCities hosts by path (www.geocities.com/$user/rest.of.url); Freeservers hosts by hostname (pineight.8m.com/rest.of.url).
2. If you feel that your ranking is still not high enough, then bid on keywords.
Will I retire or break 10K?
You can guesstimate trawling linux-kernel archives; the Google guys were having random lockup problems with early 2.2.x series kernels. Turned out they were in the IP stack. A kernel hacker asked for tcpdump logging and the Google guys explained they were getting (hundreds? thousands?) of connections per system per second.
Since Google uses Rackable Systems 1U boxes (mostly), they can put 80 in a telco cabinet along with a couple of switches. About double the normal capacity of a cabinet. That means about double the power draw per cabinet.
Exodus reworked their pricing after Google forced them to rewire a bunch of cabinets to handle double the power draw.
As of a year ago when a couple of the Google techies gave a talk at a BayLISA meeting, they had four data centers, two on the west coast and two on the east coast.