Behind the Scenes At Google
An anonymous reader writes "University of Wahington TV Presents "behind the Scenes With Google." From the site: 'Search is one of the most important applications used on the internet and poses some of the most interesting challenges in computer science. Providing high-quality search requires understanding across a wide range of computer science disciplines. In this program, Jeff Dean of Google describes some of these challenges, discusses applications Google has developed, and highlights systems they've built, including GFS, a large-scale distributed file system, and MapReduce, a library for automatic parallelization and distribution of large-scale computation. He also shares some interesting observations derived from Google's web data.' "
Google is actually a giant super computer which has become self-aware. Every person it "hires" is actually one more person it saps knowledge from. In the not too distant future, it hopes to be able to network every human completely so that it can collect the remaining knowledge on Earth more easily.
Man, that's *so* twentieth century. I came to /. for the bleeding edge in information acquisition technology: realtime optical scanning blocks of glyphs encoding human language.
I can't absorb information I can't copy/paste.
I am from a small, grease-loving country in the north called Ca-na-da.
I fsking hate proprietary video formats. Even worse than other formats!
http://norfolk.cs.washington.edu/htbin-post/unrest ricted/colloq/details.cgi?id=274
Jeff Dean
Abstract Search is one of the most important applications used on the internet, but it also poses some of the most interesting challenges in computer science. Providing high-quality search requires understanding across a wide range of computer science disciplines, from lower-level systems issues like computer architecture and distributed systems to applied areas like information retrieval, machine learning, data mining, and user interface design. I'll describe some of the challenges in these areas, discuss some of the applications that Google has developed over the past few years. I'll also highlight some of the systems that we've built at Google, including GFS, a large-scale distributed file system, and MapReduce, a library for automatic parallelization and distribution of large-scale computation. Along the way, I'll share some interesting observations derived from Google's web data. Jeff Dean joined Google in 1999 and is currently a Distinguished Engineer in Google's Systems Lab. While at Google he has worked on Google's crawling, indexing, query serving, and advertising systems, implemented several search quality improvements, and built various pieces of Google's distributed computing infrastructure. Prior to joining Google, he was at DEC/Compaq's Western Research Laboratory. He received a Ph.D. from the University of Washington in 1996 working with Craig Chambers on compiler optimization techniques for object-oriented languages.
proximity search (with adjustable range would be extra nice).
i.e.
((gopher OR shrew OR egret) AND -(mole OR newt)) NEAR(range) ((evil OR "satan incarnate") AND (roe AND -chicken))
"In Italy for thirty years under the Borgias they had warfare, terror, murder and bloodshed but they produced Michelangelo, Leonardo da Vinci and the Renaissance. In Switzerland, they had brotherly love; they had five hundred years of democracy and peace and what did they produce? The cuckoo clock." -- Orson Welles (1915--1985).
here's a hint- it isn't all sweetness and light.
http://www.fuckedgoogle.com/
I wish that the technology channel actually had programs on technology like this. This could also work on Modern Marvels on History Channel. It would also work nicely on Discovery or PBS. It is time for television programming to amaze me again!
Click here or here.
Wow. If anything can melt a university web server surly a slashdot posting with a link to a 5.6 Mbps mpeg-2 stream on a Google talk is it.
I was reading an article a year or so ago about the corporate offices of Google and how there is a projection of all the latest searches displayed in real time on the wall behind the receptionist.
Now I have some pretty important lists which I need to keep tight control over. The information really ought not be distributed outside my office. However, because of the nature of my business, I must do frequent searches using various search engines to fill in my lists.
How am I assured that my searches remain anonymous and secure with Google?
So, I'm always reading about how unfair the tech world is, because there are so few women joining it. But if you watch the video, the audience is surprisingly full of them.
I wonder when content-based search for media will be possible. Content-based image retrieval for example.
This sig does not contain any SCO code.
I wonder how Google backups its data -- especially the Gmail data. Does the GFS support automatic replication?
"Behind the scenes at Google" invokes images of clowns and mimes. Is it just me? Imagine all the people in the world who haven't used the Internet, they probably would get the same impression from the phrase too.
Saskboy's blog is good. 9 out of 10 dentists agree.
It's quite nice to see a large corporation make a contribution to Open Source, especially in such a "R&D-esque" field as supercomputing.
Who said that Open Source only rehashes existing technologies and never does anything new?
Real men don't do backups.
Mediocre or no Linux support is what I find on the video link provided by the story. Why? I hear Google relies on Linux a lot. If this is true, why is Linux support very disappointing? The same applies to GMail, and oh, even Yahoo!
Whoa, whoa.. it's hard enough for us to RTFA but now we've got to WTFV (an hour long one too)?
The average slashdotter has an attention span of 5 secon.. ooh look a birdie!
Can't wait for the "I'm Feeling Lucky" feature on that one!
I think I'm going to hurl. Enough already. They're wicked smart and have an extremely overvalued stock. Great! Let's move on.
one word: short any rally.
Here are the first 12 minutes typed out. i'm sorry i can't do the rest, but open the video and skip forward to 12:00 and go from there. i hope that these 12 minutes of my life typing this will save at least 2 other people 12 minutes of theirs.
(speech from this point...)
lots of people use google but i want to give you a flavour for what happens and what we are working on for our new systems and products. i'll focus on what are the interesting problems that crop up when you organize large amounts of information, like we do, and what you can do with lots of data and computational resources. i'll also talk about our engeneering organization.
google ha a mission statement that i like - to organize the worlds information and make it universally accessible and useful. we've moved from web searching to mail and news and searching books by scanning/ocr'ing them. this mission statment covers everything and means we won't run out of work!
a lot of our issues are to do with scale. we have 4B webpages with average 10kb/page, and lots and lots of searches per sections. it's a big problem but you solve it with lots of computers and disks and network them well.
dealing with scale comes about in a number of areas. hardware/network; what do you use. distributed systems; dealing with unreliable things. algorithims/structures; processing efficiently and in interesting ways. machine learning/info retrevial; improving quality of results by analyzing lots of data. user interfaces; we haven't done much on this yet but it would be interesting to provide new and interesting ways to naviage and refine the query by doing better things than just typing in new query words - i'd expect to see more developments in this area.
one thing we've made a decision about is that we tend to build on low cost commodity PCs. example setup: ibm eserver xseries 440, 8 2-ghz xexon, 64GB ram 8TB disk = 758,000. we use this: 88 machines that total, 172 2-ghz xeons, 176 GB ram, ~7TB = 278,000. this is 1/3x price, more cpu.
google was founded in 97 by two people at stanford working on interesting ways to use the search, but needed new hardware to do this. they'd go to the loading dock and offer to setup machine for other reasearch projects - but keep them for a while themselves to get work done. over time google was formed in 1999, and we've learned a lot since then - such as how to scale better and have good datacenter practices.
hosting centers were charging for the square foot, which is strange since their costs come from things like cooling and electricity so we got good at putting a lot of servers in one place. we know are very good at setting up large clusters quickly, such as our gigantic 2001 datacenter move configured in 3 days.
if you have that many machines you have to worry about failure. one machine might fail every thousand days, but thousands of machines mean at least a failure a day. you have to deal with this in software with replication and redundancy. one nice property of dealing with this problem is that having six copies for capacity reasons also means we now have six copies available for distributed application and load balancing. a lot of the applications we deal with are read-only, which helps handling so many querys easy.
can anyone confirm that Leni Riefenstahl was behind this film?
The best education consists in immunizing people against systematic attempts at education. - Paul Feyerabend
Thats no secret, it's pigeons.
A /. article where we shouldn't hear a whole bunch of "RTFA" posts. ;-)
WTFM? Dunno if that's as catchy.
Given the bias of the site if that's all the dirt they can dig up, Google must be a pretty good company, and/or the people at that site are just crap at digging up dirt.
Think about it, if someone really hated any of the Fortune 500 companies and bothered to dig up some dirt, there'd be tons more dirt.
I suppose Google is a young company. Give it a few more years and more parasites would have found their way into Google. Then you'd have a lot more dirt.
..has never been more appropriate
it should be _Washington_
Disclaimer: my opinions expressed herein are not necessarily those of Google, Inc.
That having been said, as a long time insider I have a pretty good idea about what really happens "behind the scenes" and let me tell you, both conspiracy theories crackpots and our slashdot fanboys are quite amusing, but the boring fact is that we are neither trying to take over the world, nor are we the best thing since the second coming of Jesus.
We used to be a very successful startup, yes, and now we are a fairly successful corporation. Yes, there are a lot of smart people working here, but don't fool yourself, "the most interesting challenges in computer science" are happening in academia, not in corporations. (Besides, anyone who knows Jeff is perfectly aware that he often tends to grossly exaggerate our importance, but to be honest that is a part of his job which he is doing really great.)
All in all, I love to work here, I thing there are a lot of very smart people here, but if you think that we are the only place on the planet where geniuses cluster lately, you are just not being reasonable. If you want to find real discoveries you have to look in places where people don't have shareholders telling them what to do. The point is that we haven't done anything new per se, only the scale of our implementations is unprecedented.
For example, in my 20% time (Google allows us to spend 20% of paid work time on personal projects) I am working with KeyKOS right now and let me tell you, this is what I call innovation. It was done in the '70s and no mainstream OS has implemented its ideas to this day so far. I'm sure that when after a decade or two a Big Corporation (be it Google, Microsoft, Apple, or IBM) reimplements KeyKOS, the Slashdot crowd will wet their pants screaming "wow, what an innovation!" completely forgetting that it was an innovation back in the '70s of the 20th century when Norm Hurdy et al. were working on it quitely with no buzz and fanfares. Please remember that "The Next Big Thing" is always an old idea but this time backed with $$$ and marketing. Please never forget it, or otherwise the people who are worth their salt will only consider you uneducated.
Google is constantly giving talks like this at universities. I saw one at Harvard back in the fall.
They aren't really news worth reporting on slashdot, since they all contain the same content.
Hey -- I love Google. Use it every day, and I think they're doing some really neat stuff. But this was an hour-long commercial for Google - -to me it looked designed to recruit from college campuses. While I think it's great that Google does this (it sure sounds like a great way to get cheap qualified labor) is it really new or interesting? Or even geeky? So we have redundant clustering, LISP-like patterns, and issues of dealing with BIG stuff. Hasn't the industry already done all of this, like dozens of times? You can't tell me VISA international doesn't handle this size data, or that General Motors doesn't have some of the same scaling issues. I read somewhere that Wal-Mart has one of the biggest computer systems in the world. To me the signal-to-noise ratio was out of whack to make it worth an hour of my time. Just my opinion folks.
poses some of the most interesting challenges in computer science and information theory and application, database theory and application and some more. It is quite a nice wide area of possible R&D with great prospects for everyone, be them starters or veterans. And please don't say C.S. includes all that (especially since bashing if I.T. degrees on /. is so fashionable these days), it doesn't.
I am putting myself to the fullest possible use, which is all I can think that any conscious entity can ever hope to do.
When google was recuiting at Georiga Tech they stated that one of their founders had the 'vision' of having half of google female in the near future.
One of the thecnical female googerls mentioned how that was probably impossible, but by shooting for the impossible you acheive a lot more than you would have otherwise.
"Not knowing when the dawn will come, I open every door." - Emily Dickinson
Anyone notice the 2 hotties right around the 8 minute mark? Since when are hot chicks in CS? I gotta transfer to that school!!
Here's a summary of the most interesting part.
Is there actually a way of watching this under Linux? mplayer refuses to stream either of the Quicktime/WMP URLs, and I can't download the files with wget because of they use rtsp/mms respectively.
Does it have anything in common with GNU's microkernel efforts?
Anyone cares to post a brief overview of KeyKOS, possibly in connection and/or comparison to Mach/HURD?
...Google seems to be down a lot lately? Like right now, I can't seem to get to it...what's with that?
ZuluPad, the wiki notepad on crack
Of course, a torrent would be even better - for their bandwidths sake.
Spine World
buddy, your site sucks. whore it somewhere else.
Short answer: yes it does, and it is actually one of the main reasons why I look forward to use Debian GNU/Hurd in the future. Let me quote my old post from January with some background and interesting links to more informations about KeyKOS:
And here is a newer post of mine asking exactly your question about KeyKOS capabilities in connection to the recent development of The Hurd, in the First Program Executed on L4 Port of GNU/HURD discussion two months ago:
Sincerely,
Pan Tarhei Hosé, PhD.
"Homo sum et cogito ergo odi profanum vulgus et libido."
this happened way back in october 2004 and was widely blogged shortly thereafter.
:/
I study physics, and even a college advanced vector algebra isn't as boring as this.
Can't they didge the speaker & employ someone with passion ?
Are all colleges in the USA as boring as this one (since it's the washington university, with quite some prestige...) ????
Considering that there isn't any magical alchemy going on behind the scenes, google is in fact pretty boring. The only thing interesting is the scale of the operation.
Dan East
(finally able to post for the first time in two weeks - wonder if anyone else had a problem)
Better known as 318230.
seriously, for the unwashed, what is KeyKOS and why are you interested?
"KeyKOS ® is a persistent, pure capability operating system."
Doesn't tell me (a non-CS major) anything useful about it at all.
Can you be Even More Awesome?!
Very good question. And for the answer see this comment, posted above in this thread.
instructions here (scroll down)
I think you get more out of if it from watching the video. Not only are there graphs and pictures at some points (like pictures of Google over the years), but you get to hear all the little jokes Jeff Dean makes (he is a pretty funny guy). Also, near the end they show a neat behind-the-scenes interface where you can look at automatically formed clusters of information. It clusters words or ideas together, which is probably used by things like Google Sets and their search engine (try searching for [lotr], it knows you mean [lord of the rings] and includes that in the search as well.
:(
He talks a little about the future of search (trying to get meaning out of searches, so it kind find stuff you are looking for, even if doesn't use the exact word (sort of like what you can do now by using the ~ when you search)) and he even makes a pot joke!
This was on UWTV (the actual station). I recorded it on our DVR for my family, but they don't really want to watch it
May be Google has done some nifty things with their file-system, but can't we forget about it already? Their search hasn't changed much http://www.google.com/">in the past six years. Of course, the fanboys will salivate over Google calculator and Google unit converter, but on the scale of Internet these "innovations" barely register.
Some of the other search engines are comparable in quality to Google (Teoma, Vivisimo), and may be better, depending on how many points you take away from Google for spam-infested results, too many blogs, too many Wikipedia clones, too many commercial sites, etc. And some sites are so much further on the innovation scale (meet BrainBoost, an artifically intelligent Internet reference desk answering any questions asked in natural English, with amazing quality and accuracy in a very friendly and usable interface) that they put Google to shame.
Future Wiki -- If you don't think about the future, you cannot have one.
Did you mean: "University of Wahington English Department"
And coming soon to Google:
Google Video Transcript Beta
For those too lazy to watch Internet video
Telling the truth is not bias. There's not two sides to every story. Sounds like you bought the fringe marketing hook line and sinker--you should think differenct because you're a unique special talented flower. Well surprise, you're not.
When being different is more important than being right, honest, or truthful, you're either a 15 year boy or a borderline scizophrenic sociopath.
I found this video back in February, isn't this a dupe? Anyway my blog post about it also has a link to good paper on the Google File System written up for the 19th ACM Symposium on Operating Systems Principles, along with video of the talk the Google guys gave at the symposium.
You might also want to have a look at my post on Eric Schmidt talking about Google to the Stanford Business School. The post also has a link to a video of Urs Hölzle talking to the University of Washington about clustering at Google.
Both are worth watching...
Al.The Daily ACK - Eclectic posts by yet another hacker
to the man behind the curtain.
"Hmmm...Let's see...Which little piss-ant company can we take over today?..."
No, wait...that's Microsoft...Hold on! Nope. I'ts Google, alright...I think...no, maybe it is MS...Ummm...
GET FREE APPLE STUFF!
point bug was embarrasing. Maybe Google should stick to only supplying other sources' answers. Jeeessshhh!
S earch WTF?
http://www.google.com/search?q=1%2F0&btnG=Google+
As pointed out by a previous post, there is a mirror available. However, if you really want, you can use this torrent instead. The video is actually pretty interesting, particularly if you are interested in search or distributed systems.
Evan Jones http://evanjones.ca/
Just throw some more hardware at it.
"It's the height of ridiculousness to say for those 9 lines you get hundreds of millions."
You should check out EROS, which is an open source OS based on KeyKOS (but updated a bit).
sub f{($f)=@_;print"$f(q{$f});";}f(q{sub f{($f)=@_;print"$f(q{$f});";}f});
ACM talk
Schmidt
Hölzle
From a public speaking point of view, his delivery sucks.
Countless "umms" - very annoying.
Picking / scratching his nose.
Waving his hands around needlessly.
Get back to the lab man.....leave speaking to other people!
how long before we're able to google our own brains for information...
Get your torrents...
Bah. They should just let us run SQL queries on their database. They can handle it.
Their arrogance and self-righteous attitude is disgusting. If you've ever had the unfortunate experience of having to deal with them for anything (unless you yourself are the CEO or CFO of a Fortune 100 or just a billionaire) you can expect:
a) attitude from h*ll
b) refusal to return phone calls
c) no acknowldgement of anything
d) desire to control control control everything
e) cheapskates
f) ignoring requests (clients alike!)
g) bunch of pretty la di da's prancing around "look at us we're all so hip and rich and under 35 and good looking"
h) no humility, humbleness. They think they know they are the best and that's the last word on anything.
i) fookin' ugly pics of those two weirdo founders always touching each other.
YUCK! I use http://jux2.com/ and love it. Even Ask Jeeves is better. They are nice to deal with and don't get all the stupid out of date "results" that you do with Google.