How does Google do it?
Doc Tagle writes "With Google reportedly on the verge of going public, more and more people want to know what makes Google tick. The Observer, serves up the answers to our questions."
← Back to Stories (view on slashdot.org)
If truth is the first casualty of war, openness is the first casualty of going public
OK - I can (perhaps) see this as being the case prior to an IPO, but that statement can't be true after it has happened...
I mean....surely once they've gone public, they'll be obliged to detail and list the sort of information that the article postulates about? The shareholders would be entitled to know how many servers google has, what their specifications are, and what their current commercial strategy is.....surely?!
PigeonRank! Duhhhhhh
> If truth is the first casualty of war, openness is the first casualty of going public.
Maybe this is the reason after all, but I think it's more about Google being simple, smart and clean. They play fair (no browser interstitials, no sneaky crap, no registration necessary...etc); I would equate Google's victory thusfar to a kind of no-nonsense attitude to business, always, no-exception.
The dangers of knowledge trigger emotional distress in human beings.
I read the article and it didn't say much at all about how Google operated. Instead, it just said we don't know how they operate because they keep it secret. But maybe that was the point to begin with.
-Vic
The only thing it's missing now (IMO) is spellcheck and an online translator, which I'm sure they're already planning. I'm also looking forward to Gmail being open to the public. After they conquer these 3 thing, whats next.. Google ISP? Google National Army?
having been a consultant at their data center a year or so back I can attest that they had well over 50,000 machines. I am not sure about the 80GB drive per machine because from what I understood was they bought whatever drive at the time was the cheapest MB/$ and would replace any dead ones with the larger ones. Also, at any given time machines just die and many of them are not replaced or repaird for months. Their cluster accounts for all this...
-eric
One -- Slashdot seems to be into content-directed ads now... as google was my ad for this story.
Two -- If you want your pages indexed faster and more frequently, sign-up and place a google adsense ad on your page. Many webmasters believe that google is having to index so many adsense pages... that is difficult for google to add many more non-ad driven pages.
Just sign up for adsense and run it a couple of weeks while you build your site. After google has spidered your site well, then just drop adsense.
Good luck. I would love to hear any of your google-related tricks.
AC
They will not have to disclose the number of machines, the OS, the anything related to the machines. Wall Street isn't buying their technology, they are buying their cash flow.
If you do not believe me, buy a share of GE. Pick up the phone, call Investor Relations and ask them how many Unix computers they have and what OS and patch level they run.
"Google has been at 4.285 billion pages for more than three months straight. The count hasn't increased in a long time... The index is maxed."
Hmm... are they using a 32-bit integer to keep the page count?
2^32 = 4.294 billion, pretty close to 4.285 billion pages.
Newbies...
I mean....surely once they've gone public, they'll be obliged to detail and list the sort of information that the article postulates about? The shareholders would be entitled to know how many servers google has, what their specifications are, and what their current commercial strategy is.....surely?!
Why would a shareholder care about server specifications? Investing is all about money. Read any quarterly report from a public company. Income statement, balance sheet, and cash flow are the primary interests on the numbers side as well as a general roadmap of where the company's heading. Warren Buffett doesn't care if each server has two 80 GB drives, or whether they have four 250 GB drives per server. The only thing that matters is that there are competent people to handle these kinds of "dirty details" that an investor doesn't give a rats ass about.
Take a look at the kinds of information you could expect from Google's quarterly reports.
Want to improve your Karma? Instead of "Post Anonymously", try the "Post Humously" option.
Google is definitely cracking down on duplicate content. In fact, they've recently patented the concept.
Insert software patent debate (where Google is the default hero due to its geek factor) here...
Thanks to the War on Drugs, it's easier to buy meth than it is to buy cold medicine!
On the other hand, here's the conspiracy theory version: what if Google IS the NSA? The IPO is a smokescreen to try to avert attention. The reason they can't show their true capability is that when the company goes public, only 20% of their hardware will actually go into the public company "Google", the rest of the hardware will still be hidden and a part of the NSA's system. :-)
[For the humor impaired, I'm just joking, but it does make you wonder...]
Craig Steffen
http://www.craigsteffen.net
GIMMEE would be nice. Well, nice for awhile and if they didn't get weird with it. Don't know if that could happen though, nature of man and all that philosophical stuff. Goes along with the current VoIP articles. They would dominate the net then if they implemented that. I know I would pay cash to them have a universal works great, any OS VoIP and no-spam, no commercial email service.
So far we know they have just a cubic load of servers, the most on the planet most likely with one private company. The government probably has more, but it's a mish mash of them, not near as sleek or coordinated, AFAIK. What COULD be next with them, practical cheap 50 dollar thin clinets that you could do a TON on, using distributed computing, from games to communication to running any business? With tech savvy like they got and their already established heavy hardware base and heavy committment to R&D, they could just 'splode with an extra 25 billion in cash all of a sudden from an IPO. OR, the money could get to them and they become just another weird company that forgets it's roots as "brains come first" and switch to "marketing crap comes first" like certain other unnamed megacorps do now.
Interesting times
For those who haven't read - there is an article written by Brin and Page - maybe a little outdated, but still interesting: The Anatomy of a Large-Scale Hypertextual Web Search Engine
unfortunately the technology spending IS part of the cash flow. "We went dumpster-diving and picked up a dozen new machines for the indexing farm" and "we entered agreement with Dell to secure a reliable source of cheap Intel servers" would both show up on the shareholder statements but the impact would not be the same.
Going public WILL expose the siginificant portion of Google technology, more sp when it has to do with hardware.
The problem with that analogy is that what software they run has absolutely nothing to do with what they do to make money.
With Google, their entire "business" - their means of generating cash flow - relies on sheer quantity of computing muscle and high performance software for their search databases. With GE, their business is making lightbulbs, dishwashers, hair dryers, electric motors and any more of thousands of different products used in residential, commercial and industrial settings. How many Unix computers they have in all their offices around the world is a causality of doing business, not their means of doing business.
I'm sure if you asked the GE Investor Relations department something relevant about how their business operates, you might get somewhere.
=Smidge=
Recycling without attribution is the first casualty of bad journalism.
I thought I had read this article before, and then I realised, I had read it before...
(although I now realise that you are not supposed to read the linked articles before posting comments - sorry)
Humorous signatures are over-rated.
Google search for the letter "a" resulted in 3,530,000,000 hits [search took 0.12 seconds].
Neat. I wonder what doing a Google search would return for other letters:
"c" -- 299,792,458 hits
"e" -- 2.71828183 hits
"h" -- 6.626068 × 10^-34 hits
"i" -- sqrt(-1) hits
"k" -- 1.3806503 × 10^-23 hits
Looks like Google is definitely busted. They should fix these bugs.
Want to improve your Karma? Instead of "Post Anonymously", try the "Post Humously" option.
I bet you wouldn't know you need more than an unsigned 32 bit integer before you hit it.
On a side note I would really like to know which one is page number 1.
Diego Rey
diegoT
That doesn't make any sense. A well-designed system is a transparent one, so Google would have no reason to let you know that they're running out of IDs.
By the way, for supplemental result... By doing a quick keyword search on Google using my domain name, I'm led to believe that pages marked "Supplemental Result" are pages that look like search results. That is, they aren't filled with any real content, other than search results from other engines. Results that could "supplement" your "result" from Google.
With google: before I give them my money, I would like to know how many servers they have, how close to capacity they are, what softwares they use (compatibility issues).
I agree it would be nice to know. But if those are your conditions for investing in Google, I think Google would probably tell you to keep your money. I imagine Google's quarterly reports would probably say something like:
"Our operation depends on having the ability to increase our server and bandwidth resources as we grow our services. Business may be adversely impacted should capacity be unavailable. Our servers are also at risk for viruses, worms, and DDoS attacks which could put the operation of those servers at risk and adversely affect business." etc...
That would give you, as an investor, the information you need to determine whether those risks are worth your money. In all likelihood you'll just have to rely on the fact that they have an army of PhDs who are smarter than you and I put together and know their shit when it comes to security, databases, clustering, etc.
Now I could be wrong. Perhaps Google is waiting for the IPO and will then detail their server infrastructure, wow Wall Street (and geeks worldwide) with their amazing capacity, and their stock will skyrocket on the first day of trading. I'd wager that Google's stock is going to have amazing gains anyway given that it's a bit of an industry darling. Other tech companies which have been thinking of going public would be wise to time their IPO very shortly after Google's and ride the wave.
Want to improve your Karma? Instead of "Post Anonymously", try the "Post Humously" option.
> 1) Why are their terms of service / Pirvacy Policy so vague?
This is to keep it simple. Exacting legal language is the path to screwing people. Vague terms of service are good because both sides can wiggle. Has anyone been sued because of these terms of service? I'd like to see some refs to that, but I'm guessing it's just to protect the general public from a-holes who would exploit Google.
> 2) Why does their cookie stay until the year 2038?
Not to be funny, but someone at Google likely knows when the end of the world is coming and has set the cookie to reflect this. Seriously, who cares how long cookies stay alive for? You can block them if you like, but I think it's really just to keep Google more effective.
> 3) Why does their Google search bar report information and auto-update without permission?
I'm against Spyware, so I don't run it, but Google tracks searches anyway, so what's the point of getting upset about it? These technologies makes Google more user-friendly. Google doesn't have loads of popups trying to get you to install the bar -- it's not right in your face. People who want it likely don't care if it auto-updates because then they have the most recent version of it.
The dangers of knowledge trigger emotional distress in human beings.
If that link gets slashdotted, here is another link of a PDF PowerPoint presenation.
Good read! This paper (with the discusion of the goodness/fastness of file appends) made me more interested in Prevalence - so much so that I am using it for my new project.
-Mark
Another wonderful speculation about Google infrastructure which You can find it here.
Ah, youthful mod!
You've been (humorously) trolled. I suggest posting in this thread to remove your "+1 Informative", or getting a friend to mod it "Funny".
What the parent is describing is not what Google will do, but what DOS did: the above scheme is how MS-DOS managed memory, except that the "selector" and "offset" were both 16-bit numbers under DOS. (Although "segment" was the more usual term for "selector".) The segment number was shifted left four places -- or put more simply but less graphically, multiplied by 16 -- and then added to the offset number, to give the whole or "flat" address:segment is multipled by 16 (shifted left 4 bits or one hex digit of multipled by 16)This allowed DOS to use 16-bit numbers to address 2^20 = 1 MB of memory, but since DOS reserved the upper 384 KB for the (remapped) BIOS and peripheral cards, programs were able to address at most 640 KB of memory; the parent's mention of "64 billion pages" is probably an allusion (increased several orders of magnitude) to this DOS limit.
Of course, this was a kludge, pure and simple, required because DOS machines were 16-bit. Among other things, it allowed the same memory locations (all but the very top and bottom memory addresses) to be addressable by several different addresses, and discovering pointer aliasing it required calculations that, by their very nature couldn't be done wholly in the machines (16-bit) registers.
Consider: segment 4, offset 0 is 4 * 16 + 0 = 64,
and segment 3, offset 16 is 3 * 16 + 16 = 64,
and segment 2, offset 32 is 2 * 16 + 32 = 64
and segment 1, offset 48 is 1 * 16 + 48 = 64
and segment 0, offset 64 is 0 * 16 + 64 = 64:
so all five segment:offset pairs are apparently different but actually point to the same memory location.
Opinions on the Twiddler2 hand-held keyboard?
"To be or not to be"
and I honestly can't see what you are going on about: of the first ten results, eight highlighted the phrase in the page synopsis, one used the phrase as a domain name, and one included the parital phrase "...Or Not To Be."
Note the elipsis on that last one: it alludes to a larger portion of text preceding the printed portion. And the domain-name was found even though the spaces were omitted.
Those aren't irregular results: those are highly intelligent results.
Just because they aren't deterministic enough for you to plug them into a piece of code of your own construction (without compensating Google) doesn't mean that they don't fulfill the purpose of the web search.
What is the difference between a small revolutionary change and a large evolutionary change?
Google is cracking down on dupes? Oh no, Slashdot is doomed! :-)
The problem is, I've never paid these people a single penny for ANY of this. How the hell are they going to make money?
Um, you do realize that Google already makes a profit, don't you? I daresay the IPO will puff the value of the company up beyond the rational amount, but that's not 'Enron' -- if you are going to use buzzwords, use the right ones. Enron was a case of internal actors in the company using financial games to siphon off profits and inflate the value of the company on the books. You accusing Google of financial fraud? If you are going to use a buzzword, use 'Yahoo' or something -- a solid company that got its stock price puffed up excessively due to investor mania.
How the hell did this get moderated up, except as 'Funny'?
Results 1 - 10 of about 5,750,000,000 for the [definition]. (0.11 seconds)
Doesn't that imply more than 4.285 billion?
This issue is a bit more complicated than you think.
"Google manages to achieve this with sophisticated techniques for rippling changes through the cluster, yet achieves 100 per cent uptime. This is serious stuff, and there are a lot of IT managers out there who would give their eye-teeth to be able to do it half as well."
Sigh...as an IT manager I can only dream of 50% uptime. Damn you, Google!