How does Google do it?
Doc Tagle writes "With Google reportedly on the verge of going public, more and more people want to know what makes Google tick. The Observer, serves up the answers to our questions."
← Back to Stories (view on slashdot.org)
If Google had chosen to go with a superior platform, they probably would have been able to go pubic already. When I meet with them I will recommend IIS 6.0, ASP.NET on Windows Server 2003.
William Stephens
MCSE,MCDST,Well Respected VBScripting Guru
williams007@yahoo.com,(212)275-4831
If truth is the first casualty of war, openness is the first casualty of going public
OK - I can (perhaps) see this as being the case prior to an IPO, but that statement can't be true after it has happened...
I mean....surely once they've gone public, they'll be obliged to detail and list the sort of information that the article postulates about? The shareholders would be entitled to know how many servers google has, what their specifications are, and what their current commercial strategy is.....surely?!
Google has been at 4.285 billion pages for more than three months straight. The count hasn't increased in a long time... The index is maxed.
Google has recently removed tens of thousands of "duplicate content" sites from its index - where "duplicate content" is as simple as being an affiliate site (e.g. Amazon) and having the same textual item descriptions as many other sites.
Google is now in the process of dropping millions of link records from its index, presumably to make room for more pages.
Google is wavering.
Gmail is a distraction, a venture into some other space to keep people from noticing that their search product is degrading.
May she last as long as possible...
PigeonRank! Duhhhhhh
A large infusion of cash from some scary-assed three letter agencies that would be very interested in a centralized repository of the tastes and proclivities of nearly everyone in the world connected to the Internet.
> If truth is the first casualty of war, openness is the first casualty of going public.
Maybe this is the reason after all, but I think it's more about Google being simple, smart and clean. They play fair (no browser interstitials, no sneaky crap, no registration necessary...etc); I would equate Google's victory thusfar to a kind of no-nonsense attitude to business, always, no-exception.
The dangers of knowledge trigger emotional distress in human beings.
or at least, a variation on a dupe.
Those who can, do. Those who can't, consult.
Sure would be nice to see some of that amazing tech coming back into the community...
I read the article and it didn't say much at all about how Google operated. Instead, it just said we don't know how they operate because they keep it secret. But maybe that was the point to begin with.
-Vic
The only thing it's missing now (IMO) is spellcheck and an online translator, which I'm sure they're already planning. I'm also looking forward to Gmail being open to the public. After they conquer these 3 thing, whats next.. Google ISP? Google National Army?
having been a consultant at their data center a year or so back I can attest that they had well over 50,000 machines. I am not sure about the 80GB drive per machine because from what I understood was they bought whatever drive at the time was the cheapest MB/$ and would replace any dead ones with the larger ones. Also, at any given time machines just die and many of them are not replaced or repaird for months. Their cluster accounts for all this...
-eric
...didn't answer shit.
There are no answers in the article at all. Just the usual questions about how Google's publicized statistics don't add up.
I lost a couple of sites from Google this month, presumably due to duplicate content; they were nearly verbatim clones of some of my other sites. The original sites are still there, the "clones" vanished from Google. As in, even if I search for those domains directly, I get nothing, where I used to get a cached copy of the sites. They've quite literally vanished from Google's database.
Can you back up your assertions that Google's index is full? It's a rather interesting theory, and perhaps an explanation for all the tweaking they've done lately.
"BSD: Free as in speech. Linux: Free as in beer. Windows 10: Free as in herpes." --Man On Pink Corner in #52607549.
One -- Slashdot seems to be into content-directed ads now... as google was my ad for this story.
Two -- If you want your pages indexed faster and more frequently, sign-up and place a google adsense ad on your page. Many webmasters believe that google is having to index so many adsense pages... that is difficult for google to add many more non-ad driven pages.
Just sign up for adsense and run it a couple of weeks while you build your site. After google has spidered your site well, then just drop adsense.
Good luck. I would love to hear any of your google-related tricks.
AC
They will not have to disclose the number of machines, the OS, the anything related to the machines. Wall Street isn't buying their technology, they are buying their cash flow.
If you do not believe me, buy a share of GE. Pick up the phone, call Investor Relations and ask them how many Unix computers they have and what OS and patch level they run.
'nuff said.
(You may wish to take issue with the above..)
$ strings FTP.EXE | grep Copyright
@(#) Copyright (c) 1983 The Regents of the University of California.
Google search for the letter "a" resulted in 3,530,000,000 hits [search took 0.12 seconds].
Do you or your partner snore? - Visit www.snoring.com.au
Perhaps this is another form of secrecy - the number of pages indexed never seems to go up, except in huge jumps. According to archive.org, it's been stuck on 4,285,199,774 pages for about a year now :/
I mod down anyone who says "I will be modded down for this", regardless of the rest of their comment
I mean....surely once they've gone public, they'll be obliged to detail and list the sort of information that the article postulates about? The shareholders would be entitled to know how many servers google has, what their specifications are, and what their current commercial strategy is.....surely?!
Why would a shareholder care about server specifications? Investing is all about money. Read any quarterly report from a public company. Income statement, balance sheet, and cash flow are the primary interests on the numbers side as well as a general roadmap of where the company's heading. Warren Buffett doesn't care if each server has two 80 GB drives, or whether they have four 250 GB drives per server. The only thing that matters is that there are competent people to handle these kinds of "dirty details" that an investor doesn't give a rats ass about.
Take a look at the kinds of information you could expect from Google's quarterly reports.
Want to improve your Karma? Instead of "Post Anonymously", try the "Post Humously" option.
more and more people want to know what makes Google tick
Google has already told everyone what makes them tick! Imagine, a beowulf cluster of pigeons
Karma: -2^0.5 . Mainly due to the imbibing of dihydrogen monoxide
Interestingly, a9.com, which copied Google, contains the same search errors.
On the other hand, here's the conspiracy theory version: what if Google IS the NSA? The IPO is a smokescreen to try to avert attention. The reason they can't show their true capability is that when the company goes public, only 20% of their hardware will actually go into the public company "Google", the rest of the hardware will still be hidden and a part of the NSA's system. :-)
[For the humor impaired, I'm just joking, but it does make you wonder...]
Craig Steffen
http://www.craigsteffen.net
and google would be nationalised in an eyeblink as soon as they realised that google has enough computing power to do simulation of nuclear weapons :) possibly in realtime !
that must why they're so secretive !
MP3 Search Engine
Putting on my computer scientist hat I would guess:
- instead of backup, hold data in multiple places at once
- use a "cascaded rsync" to trickle software changes to thousands of nodes
- then load software via NFS at node bootup
- use nodes just to store data; keep software in RAM for speed
Just a few thoughts.
That's what I said, idiot. See parent: "2 or so in the top 10 links not even containing that phrase"
Duh. 2 of 10 without means 8 with. Duhhhh....
So how can an open/free alternative service can possibly happen?
The only way I can think of is to have a distributed system around the world
GIMMEE would be nice. Well, nice for awhile and if they didn't get weird with it. Don't know if that could happen though, nature of man and all that philosophical stuff. Goes along with the current VoIP articles. They would dominate the net then if they implemented that. I know I would pay cash to them have a universal works great, any OS VoIP and no-spam, no commercial email service.
So far we know they have just a cubic load of servers, the most on the planet most likely with one private company. The government probably has more, but it's a mish mash of them, not near as sleek or coordinated, AFAIK. What COULD be next with them, practical cheap 50 dollar thin clinets that you could do a TON on, using distributed computing, from games to communication to running any business? With tech savvy like they got and their already established heavy hardware base and heavy committment to R&D, they could just 'splode with an extra 25 billion in cash all of a sudden from an IPO. OR, the money could get to them and they become just another weird company that forgets it's roots as "brains come first" and switch to "marketing crap comes first" like certain other unnamed megacorps do now.
Interesting times
For those who haven't read - there is an article written by Brin and Page - maybe a little outdated, but still interesting: The Anatomy of a Large-Scale Hypertextual Web Search Engine
See Google's chastity belt too tight (PartsExpress.com listing removed via SafeSearch because "sex" in domain name) and Google In Controversy Over Top-Ranking For Anti-Jewish Site (Google picking out Googlebombed results) for recent examples.
If you did not know that 10 - 2 = 8, it is a wonder you can even turn on your machine. Or has Mommy let her pre-scrooler use slashdot?
There is plenty of evidence to suggest that Google has run out of docid's, hitting the 32-bit integer limit.
The best evidence is doing a search which returns results which say "Supplemental Result" next to them. That'll be coming from a second document store I'd guess.
They requoted all of Garfinkel's observations without adding *anything* to it...not a single insightful/informative sentence which adds anything to his article...they might as well have redirected the readers there.
I disagree. An investor deserves to know at least general information about the goings on of a business. If I were a stock broker I would want to know that say: FruitCompanyA uses insecticide whereas FruitCompanyB doesn't. I personally would choose FruitCompanyA as a a rise in the insect population would ruin FruitCompanyB.
With google: before I give them my money, I would like to know how many servers they have, how close to capacity they are, what softwares they use (compatibility issues).
Honest reporting of operations lets an investor make an intelligent decision about their money and helps avoid boiler-room companies.
"For example, how do you implement security patches and operating-system upgrades (much more frequent in Linux than in proprietary systems from Microsoft or Sun)"
Sustained, thank you :)
...hell, me the anonymous coward post a lot of stuff sometimes and don't read the stuff either.
But in this case I think this is an article really should read. Here is the first paragraph, really great trick too!
Here's a cheap trick to play on an audience - especially one drawn from the business community. Ask them how many use Microsoft software. Virtually every hand in the room will go up. How many use Apple Macs? One or two - at most. How many use Linux? If the audience is drawn from corporate suits, no hands will show. Now comes the punchline: who uses Google? A forest of hands appears. 'Ah,' you say, 'that's very interesting, because it means you're all Linux users.' Stunned looks all round.
unfortunately the technology spending IS part of the cash flow. "We went dumpster-diving and picked up a dozen new machines for the indexing farm" and "we entered agreement with Dell to secure a reliable source of cheap Intel servers" would both show up on the shareholder statements but the impact would not be the same.
Going public WILL expose the siginificant portion of Google technology, more sp when it has to do with hardware.
The problem with that analogy is that what software they run has absolutely nothing to do with what they do to make money.
With Google, their entire "business" - their means of generating cash flow - relies on sheer quantity of computing muscle and high performance software for their search databases. With GE, their business is making lightbulbs, dishwashers, hair dryers, electric motors and any more of thousands of different products used in residential, commercial and industrial settings. How many Unix computers they have in all their offices around the world is a causality of doing business, not their means of doing business.
I'm sure if you asked the GE Investor Relations department something relevant about how their business operates, you might get somewhere.
=Smidge=
If they are playing fair, (i.e. sans "sneaky crap"), then:
1) Why are their terms of service / Pirvacy Policy so vague?
2) Why does their cookie stay until the year 2038?
3) Why does their Google search bar report information and auto-update without permission?
Google freaks me out after reading this page:
http://www.google-watch.org/
Sorry if that's a bit paranoid, but if you have some counter-information I'd be glad to read it.
Recycling without attribution is the first casualty of bad journalism.
I thought I had read this article before, and then I realised, I had read it before...
(although I now realise that you are not supposed to read the linked articles before posting comments - sorry)
Humorous signatures are over-rated.
IANACS (...computer scientist)
Why did you have verbatim clones of sites?
Are you running pr0n sites that exist soley on the purported 'AD' dollars coming your way??
I do not mean disrespect for the pr0n industry... I know that they generate BILLIONS of dollars..
But Seriously, what is the general utility/usefullness of numerous identical sites??
-i am not a comp scie...
Blah
They're about to go public. Pumping your stock up involves a stream of "improvements" and conquests after you go public to show investors that your company is king of the hill. Why spend that ammo now rather than wait until it actually generates value for the company?
much more frequent in Linux than in proprietary systems from Microsoft or Sun
Huh? Does it!? Since when? I like these throw-away lines the media people dish out. What is their basis for this statement? Even when they see Linux obviously succeeding, they dish out a statement like this.
I certainly don't have to patch my Linux boxes as frequently as my Windows boxes. Actually... no... wait, they're right! I only need to patch Windows once. Ctrl-Alt-Del -> Boot Debian CD.
-- main(s){printf(s="main(s){printf(s=%c%s%c,34,s,34
.. or another iPod article..
- It's not the Macs I hate. It's Digg users. -
Larry used to use that in the pitch for institutional investors before VA went public. That's where that came from.
Some comments on the linked article:
> it means you're all Linux users.
What is that - guilt by association?
>how do you implement security patches and operating-system upgrades (much more frequent in Linux than in proprietary systems from Microsoft or Sun) on thousands of servers without causing disruption to service?
You don't implement any security patches and upgrades because those systems are used only by Web servers; it's not like some Web client will hack into their servers... You boot thousands of servers from NFS or such; you upgrade system images once a quarter, together with Google's own software.
>yet achieves 100 per cent uptime.
Uptime of what? Of www.google.com, using round-robin load balancing to several geographically dispersed data centers. What's the big deal about that?
But I've seen 404 on www.google.com and the paid AdWords Admin Web is down quite often(anyone who ever used it knows what I'm talking about).
With google: before I give them my money, I would like to know how many servers they have, how close to capacity they are, what softwares they use (compatibility issues).
I agree it would be nice to know. But if those are your conditions for investing in Google, I think Google would probably tell you to keep your money. I imagine Google's quarterly reports would probably say something like:
"Our operation depends on having the ability to increase our server and bandwidth resources as we grow our services. Business may be adversely impacted should capacity be unavailable. Our servers are also at risk for viruses, worms, and DDoS attacks which could put the operation of those servers at risk and adversely affect business." etc...
That would give you, as an investor, the information you need to determine whether those risks are worth your money. In all likelihood you'll just have to rely on the fact that they have an army of PhDs who are smarter than you and I put together and know their shit when it comes to security, databases, clustering, etc.
Now I could be wrong. Perhaps Google is waiting for the IPO and will then detail their server infrastructure, wow Wall Street (and geeks worldwide) with their amazing capacity, and their stock will skyrocket on the first day of trading. I'd wager that Google's stock is going to have amazing gains anyway given that it's a bit of an industry darling. Other tech companies which have been thinking of going public would be wise to time their IPO very shortly after Google's and ride the wave.
Want to improve your Karma? Instead of "Post Anonymously", try the "Post Humously" option.
How the heck is this news? The article just summarised the Simson Garfinkel article for the business types. Slashdot already covered the calculations from Garfinkel, and therefore this is just a repeat! Booring.
What time is it/will be over there? Check with my iPhone app!
Do you know how many servers IBM have? Akamai? Microsoft?
Be reasonable.
Financial information is important, their business plan is important, it is probably important to know that they are running Linux so that SCO-type problems can be factored in. The sort of fine technical details the Observer goes into are totally irrelevant, just an incidental business expense. We know that it all works and that Google are on top of what they do. That is what matters.
Mielipiteet omiani - Opinions personal, facts suspect.
> 1) Why are their terms of service / Pirvacy Policy so vague?
This is to keep it simple. Exacting legal language is the path to screwing people. Vague terms of service are good because both sides can wiggle. Has anyone been sued because of these terms of service? I'd like to see some refs to that, but I'm guessing it's just to protect the general public from a-holes who would exploit Google.
> 2) Why does their cookie stay until the year 2038?
Not to be funny, but someone at Google likely knows when the end of the world is coming and has set the cookie to reflect this. Seriously, who cares how long cookies stay alive for? You can block them if you like, but I think it's really just to keep Google more effective.
> 3) Why does their Google search bar report information and auto-update without permission?
I'm against Spyware, so I don't run it, but Google tracks searches anyway, so what's the point of getting upset about it? These technologies makes Google more user-friendly. Google doesn't have loads of popups trying to get you to install the bar -- it's not right in your face. People who want it likely don't care if it auto-updates because then they have the most recent version of it.
The dangers of knowledge trigger emotional distress in human beings.
My favorite Google features:
i ntl/xx-klingon/m er/
http://labs.google.com/
http://www.google.com/
http://www.google.com/intl/xx-el
Let's leave it to Google, facing an IPO, to play these numbers and the PR game how they feel will most benefit them and deter their competitors.
This post is brought to you by Microsoft [tm] Internet Explorer (r), the only browser for the Internet. Remember Mosaic and Splyglass? We don't.
If that link gets slashdotted, here is another link of a PDF PowerPoint presenation.
Good read! This paper (with the discusion of the goodness/fastness of file appends) made me more interested in Prevalence - so much so that I am using it for my new project.
-Mark
The meat of the article is just the observation that the numbers Google puts out (for # of servers, # of hits, etc) are inconsistant. The only conclusion it comes to is that google has more 'horsepower' than it's letting on.
I think this is the wrong question investors need to be asking about Google before they IPO. Sure, it makes for some great geek gab; the fetishistic wonderment of just how many servers Google is running, how many hits they get and how exactly they manage to, well, manage that many servers. In the end though, answering those questions doesn't tell us anything about what Google is actually selling.
.com heyday. While Yahoo was busy playing in Hollywood and becoming a "Portal" and Alta Vista was going down the tubes, Google's simple, whimsical, easy to use front page didn't get gaudy by trying to make us sign up for accounts or any of the other marketing department crap. Finally, Google has a high Willy Wonka factor, sort of like Apple. We don't hear much from the company in the way of press releases or other information, but every so often, they open the doors and it turns out the PhD Umpa Lumpas there developed something totally cool. Local search, Froogle, gMail and Okurit are examples of this...
The more and more I look at it, the more and more I fear Google is just nothing more then a very well calculated shill game; the Enron of technology IPOs...
Pretty much everyone who uses the internet loves Google and we do so for a combination of three compelling reasons; First off, Google offers up what is basically the best search engine on the internet. It isn't perfect, it doesn't work all the time but it is the best thing out there right now. Second, they offer this high-quality search service without all the excess bullshit that got tacked onto all of the other search engines on the market in the
The thing that gives me the heeby geebies about Google is how they make all of this look so effortless. Okurit just sort of popped out of the open one day. gMail appeared on April 1 with such an "effortless" air about it all that Google didn't even bother to take the press release seriously. We keep hearing these cryptic references from the company about some overwhelmingly massive amount of computing power they have and how their kabul of PhDs has it humming along with levels of efficiency that are a world beyond most everything else out there.
All of this has made for a very pumped up environment for an IPO, but we still have yet to get an answer to the question "What is Google's business model?" I "google" words all day. I have an Okurit account that I use. I even use Google as a quick and dirty calculator. When it opens up, I will have a couple of gMail accounts. The problem is, I've never paid these people a single penny for ANY of this. How the hell are they going to make money?
Sure, we can say that Google has integrated advertising within the search results, but the advertising model has always proven to be of dubious effectiveness at best. Google has an enterprise search division, but the cost of their Google Appliance is a pittance compared to the sort of money big time enterprise software companies like Oracle and SAP are making, how can they survive on that revenue stream and pay the bandwidth bills for all of the free services they offer to the public?
We always tend to answer these questions with an "I don't know, but Google must be doing something right." Google works very hard to continue to fuel the fire that they are doing something paradigm shifting with all of those PhDs they have on the payroll, and how many servers they have, and how they can just sort of effortlessly announce 1gb free email accounts. We keep drawing up the impression that these guys must have something HUGE up their sleeves, and they have us salivating for the IPO so we too can be part of it.
Very soon, Google executives are going to pile onto a Gulfstream V and do a roadshow for big time investment houses and institutional investors and they are going to be trying to convince these guys to buy Google IPO. They are going to be asked exactly what sort of business model Google is going to be pushing and one of two things is going to happen:
- Google will c
Underpants gnomes.
I'm amazing. You aren't. SUCK IT
Google programming contest pays off again.
Using Phelps and Wilinski's "Robust Hyperlinks" concepts to detect duplicate content.
Another wonderful speculation about Google infrastructure which You can find it here.
I mean, how else could they do it?
Their success lies with the name.
I mean, google, that's pure genius.
I started using it for the name only, then found it to be much more useful than hotbot.
"When I look back, my life is not a foreign country, it's more like a library book returned long ago." - ????
As far as I can tell there is no better way for that hardware to have come "back into the community."
The service is free, and they're really good at what they do. I would say I'd be lost without google on the internet, but really this compliment goes for lots of search engines - I'm really very grateful this sort of service still exists for free (well, with ads.)
Unless you want to talk about cures for diseases through protien folding simulations, I can't think of a better way for this hardware to be used, such that it begets a greater net benefit.
The snow doesn't give a soft white damn whom it touches. -- ee cummings
Akamai?
"When I visited the company in January, the screen said that Akamai was serving 591,763 hits per second, with 14,372 CPUs online, 14,563 gigahertz of total processing power, and 650 terabytes of total storage. On April 14 [2004], the number had jumped to a peak rate of 900,000 hits per second and 43.71 billion requests delivered in a 24-hour period."
From this article.
Why did I read "tech" as "hardware"? I assume you refer to the people... oh well.
I guess it would be nice for google to work some its technology/people into the community. Maybe someday they will.
The snow doesn't give a soft white damn whom it touches. -- ee cummings
You betcha! I'm looking at starting up a search engine and would really love their technology for free!
the article never answered any of our questions - heck, i even looked for a "Page 2" link after reading the entire thing, sadly, the article ended w/o even attempting to answer its own questions.
This isn't the first article [of late] to attempt to describe how Google works - but it's one of the most recent which doesn't include pretty picture.
We want pretty pictures!!!!
No, they have to have people who understand technical details to be able to produce legitimate forecasts of output. I'm sure there are people who analyize how many workers and robots Ford has to estimate how many cars they can produce, right? So the equvilent is how many coders and systems Google has, no?
Well if they don't, big brokerage houses can reply and I will consider the most lucrative offer.
espo
Searching for "the" returns:
Results 1 - 10 of about 5,690,000,000 for the [definition]. (0.11 seconds)
But Google homepage says:
Searching 4,285,199,774 web pages
Is there a difference?
Google is spending its time maintaining an unparalleled search engine, and they simply haven't had either the time or inclination to send someone around counting up all the capacities of all the hard drives, so that some fool can ask how much capacity they have.
However, now that the company is contemplating going public, people DO want to know the answers to these questions, and as a publicly traded company, they may be required to answer many of these questions.
So expect to see the company go from mostly techies to mostly lawyers!
Durrrh... that's exactly what I was thinking. :-)
THIS THING CAN TURN ON A DIME, MACROSSZERO STYLE ALSO FUCK BETA, ~NYORON
If they 'give back some of their amazing technology back to the community', then they would lose their competitive edge. A Ferrari is fast b/c of the engine, if they had to 'open source' their engine, then we'd see minivans capable of 180mph.
Is what happens if the entire complex reaches an order of magnitude close to the complexity of the mammalian brain.
And instantly becomes SELF-AWARE.
Google----> First AI?
Robot.txt
The Google bot respects it, so if you're up to no good, it's easy to get Google to not index your page.
Anyway, I'd like to see a version of google that didn't respect robot.txt. You'd used to be able to dig up alot of infermation on peopel on google before they started to use robot.txt on alot of sites.
Sometimes I wish I was a plumber, then I'd know how to deal with other people's shit.
BTW, I was quite surprised to hear about the part about CostCo's success with it's farier treatment of employees. I much prefer shopping at CostCo than I do at Sam's Club. (At the very least, the free samples of various food they hand out make shopping less monotonous) Coincidence? I think not.
Do yourself a favor, slashdot, and get a membership. It's worth it.
THIS THING CAN TURN ON A DIME, MACROSSZERO STYLE ALSO FUCK BETA, ~NYORON
Yup, Akamai. That link of yours points to 'Technology Review' (I can't say more because it appears to be slashdotted) and not the 'Wall Street Journal'.
This is just fun stuff that the company chooses to publish, it is not Investor Information'.
Mielipiteet omiani - Opinions personal, facts suspect.
i say google-watch.org is as credible a site as this one: www.realultimatepower.net - go ahead, click the link - its a hilarious site
Not necessary. The shareholder information is mostly and primarily business information relating to investments, profits etc. As to the technology used, there is a pretty simple excuse ... confidentiality of technology used. For instance, Colc-Cola does not have to publish their secret formula just because they have shareholders to report to. The know-how and technical details are their only and most valuable asset, and describing how they index, how they patch and what OS they use would be suicidal.
http://www.automatiq.se
very simple example of 15 servers in 3U. Many vendors are also offering a "dual dual" system in 1U... that is a two dual CPU motherboards that fit in one case.
THIS THING CAN TURN ON A DIME, MACROSSZERO STYLE ALSO FUCK BETA, ~NYORON
The original analogy is a little off. However, if you look at eBay, do they disclose how many systems they are running? How about Amazon? Do I care?
The real fact of the matter is, they have custom software that they run. The number of systems, speed, memory and OSs are simply a byproduct of what they really offer: a service.
Google is no different. They offer a service. As long as they are profitable, as an investor, I could care less if the systems were running on Dell's, White Boxes, Mac, or Commodore-64s. They have found a way to make the business run on the systems they have.
Google is now in the process of dropping millions of link records from its index, presumably to make room for more pages.
It's possible that the index is full, but I would imagine that they would have seen this coming long ago, as it "filled up", and taken measures. What's more likely behind the elimination of duplicate pages is that more and more people have been complaining about the search results relevancy and how site owners have been taking advantage of certain known flaws in the Google algorithm. So, they are taking steps to fix the algorithm, and kill off all the fake sites.
"Who are in control, they are not in control of anything - they don't even control themselves!" - Glen Beck
btw, how many of you have tried gmail? i think the interface is incredible .. its hard to imagine that web based email has been around for such a long time and nobody came up with an interface as nice as this. perhaps its just that i have a very fast connection, i'd have to try whether it works as well on slower connection speeds also.
to the answer of how google does it: google seems to take whatever they do, and do it very well! even the example of email graphical user interface, which nobody even cares to talk about.
Hmmm... 1000 queries per second at peak time?.?
Perhaps we should test out there systems...
Google... try slashdoting that...
Come on, the nodes in their clusters are not desktop computers with office software on it.
The system running these machines are rather very stipped down: They only need very few applications and a very simple kernel (not many device drivers, maybe no graphic card driver, ...).
Furthermore there are no local users on the the machines -> many security flaws wont affect the integrity. And remote holes in the kernel occur not very often.
And above all these cluster nodes are certaily shielded by some sort of firewall. Therefore they don't have to care for network security themselves.
All in all: I believe that you need to update such machines rather infrequent. At least not for security reasons.
Titus
Maybe they do not give back in terms of technology, but they do give back to community! I can think of at least in two ways. ;-)
1) They provide an alternative to Microsoft. Not only search, it looks like they will give a blow to hotmail as well. They prevent MSN from becoming the portal. I think this is very important, people see things can be done better than the Microsoft way, and it can be done with Linux
2) They make the communication within Open Source and Free Software community much easier. I keep a log of visits to my webpages, and 90% of hits come from Google searches. I almost exclusively use Google to find a resource for any project I am working on, including Free Software resources. Without Google, I might have had to filter through a hundred compiler advertisement pages, before getting information on a trick for GCC preprocessor. Now I type what I need, it is usually on the first page. Granted, I use Google because I am lazy, but people are generally lazy.
I think Google does give back to community, in a way, they enable us to be a community.
ato
That IS amazing!
And the muscular cyborg German dudes dance with sexy French Canadians
Say I just invested twenty years at Google figuring out search engines.
Now figure I am selling my options.
Now add that more people will buy them at a higher price if they are impressed with the number of computers.
I think there is a big temptation for Google to expose whatever it has to expose if it means getting the option value up.
After they cash out their options - google can compete, not compete or whatever - it will be the publics problem.
AIK
These two articles seem really similar. The article at Technology Review has a bit more detail, and also covers Akamai well. Interesting as heck. But is the first page copied a bit, or plagiarized?
;-)
I'm going to go Google for some of those same sentences.
Those who would give up Essential Liberty to purchase a little Temporary Safety, deserve neither Liberty nor Safety
The two results you mention are obviously wrong: the one that used the phrase as the domain name obviously did not: it used an ALTERED version of the phrase (with all spaces removed). The partial phrase is so obviously off it needs no explanation, except to add, if partial phrase errors are OK, you are probably someone who would search for information on president "Andrew Jackson" and be happy that "Michael Jackson" results are in the list.
"Those aren't irregular results: those are highly intelligent results."
As intelligent as Homer Simpson. These are "D'oh!" results. I've seen this in many other searches: I ask for an exact thing and get many resulting pages that do not even contain it.
"doesn't mean that they don't fulfill the purpose of the web search."
I asked for A, they gave me B. It is a big bug: if you read Google's own search instructions, it says that it is not supposed to glitch out this way..
"And the domain-name was found even though the spaces were omitted."
The space-emitted version is a glitch, as I did not want that.
Anyone else notice its usually an anonymous coward who selfishly demands access to that which they do not deserve access to?
Mac OS X and Windows XP working side by side to fight back the night.
Yes we need a google linux for the desktop. If google sells out, what a shame, Warren Buffet will have taken over the planet. The whole idea being - how to make a lot of money without working for it.
If work is defined as 'The expenditure of energy to the benefit of others that cannot be achieved by automation, then not many people accumulate money by honourable actions. In other words they get paid without doing something for it.
The lads at google have done a marvelous job, an all the greedy fat capitalist bastards want to do is stick their noses in the trough, an suck their sustenance. Whilst shagging the planet.
If I were going to become a serious investor (in my mind, $millions) in a particular company, I would want to know certain details about it.
In your coca-cola example, I'd want to know some rough numbers about their production capacity. If thy're planning on taking on a new market, will they be able to meet the demand? If one facility goes offline (maybe a terrorist attack), can their other facilities absorb the needed supply?
I would want to know some similar things about google. I would need to know something about their infrastructure. How close are they to maxing their current resources? What will it take to add more? If they lose a data center, can they make up the load?
One of the biggest problems in the market these days is that people invest lots of money in things they don't understand. Then they wonder why they lose all their money. There's going to be a lot of ignorant people who will be investing lots of money in google just because someone else told them it's the next hottest thing. It might be, but that's a dumb reason to put money in it.
Having a PhD does NOT make one smarter, it just means that said person found (or has) the financial means to become more educated; or is it edumicated....I forget.
[SIG] Remember Mattel handheld games?
I can understand how in some cases an IPO can help generate revenue necessary to operate and break into new markets, but does this apply to Google? I really don't think so. They have market share; they have resources. Any infusion of funds to the company is more likely to give them the ability to further diversify and enter different markets, which history has shown is more often than not, a bad business idea.
So one has to assume the IPO is the first phase of the principals "cashing out". The press will probably signal this as a sign of the next dot com boom, and a bunch of nerds within the company will suddenly become millionaires, and subsequently quit their job and open up a Bed & Breakfast in some obscure town or join the World Poker Tour. There goes the talent.
> How does Google do it?
What kind of kinky question is that! I don't wanna know!
What did I tell you? -1 baby! Guess where I am? Go on, guess!
No.
No.
Yes, that's right, in heaven! Woohoo, now I dance.
Suck ass? Google used to rock, but they've been fucking up BIG TIME lately. Couldn't care less how they tick. Probably much like Microsoft lately.
Remember, "There is no search business." There is however an advertising business.
There is no search business. Read up on these: Google is out of control There is no search business
Are low interest student loans that hard to come by? Am I missing something?
GMail Screenshots that aren't available any where else are here
What drives the Google architecture? I dunno. Borrowed muscle/money from the shadow government?
I mean, Google already has 'ex'-NSA guys (no such thing as 'ex') on their payroll.
Ho hum. .
-FL
1) They provide an alternative to Microsoft. Not only search, it looks like they will give a blow to hotmail as well. They prevent MSN from becoming the portal. I think this is very important, people see things can be done better than the Microsoft way, and it can be done with Linux ;-)
Very important. Microsoft thinks they own so much of the browser market that you now have to accept their self signed certs to log out of hotmail. This means non-MS people are pulled with a message saying that there is no Certificate Authority behind this site. What's next? Drop the global DNS system and only allow MS Browsersto find your site???
Can I get an eye poke?
Dog House Forum
Its not Google in fact, its some geek coder uses google himself and forces USERS to use it too.
.ini hacking, you must HACK EXE WITH HEX EDITOR, I mean, the application, .app whatever.
:)
I give you a list
1) Safari
2) Omniweb
3) Opera 7.x versions
4) Camino
5) Of course, Mozilla
Those browsers come with google search default, in Safari its more than
Opera is commercial, so as Omniweb. I understand they make money with referring searches to google by default, just like paid bookmark inclusion. I of course feedback to them too.
Do I have to use google? We all have to? As a guy paid for OSX, I have to hack the Safari app itself just to use another engine?
Oh, on OSX, guess which browser gives users choice for Search Engine? IE 5.2
Results 1 - 10 of about 5,750,000,000 for the [definition]. (0.11 seconds)
Doesn't that imply more than 4.285 billion?
This issue is a bit more complicated than you think.
"Google manages to achieve this with sophisticated techniques for rippling changes through the cluster, yet achieves 100 per cent uptime. This is serious stuff, and there are a lot of IT managers out there who would give their eye-teeth to be able to do it half as well."
Sigh...as an IT manager I can only dream of 50% uptime. Damn you, Google!
How much electricity they use ... yearly electrical bill is more than $21 million. "
... considered a corporation in terms of dollars spent, floor space occupied, and personnel employed, it would rank in the top 10 percent of the Fortune 500 companies."
- "the 2nd largest user of electrical power in Maryland.
How big in # of people and budget
- "if
Wonder how google ranks in those metrics - and we may get a good ballpark feel of how much data they can store and process.
Wallstreet should be seen for what it is: a plague upon american businesses and innovation.
You get your initial investment, which seems great, but then you sell your soul. You will be forced to "cut the fat" and "yeild higher short-term profits" and all resarch projects that make tech companies great will vanish.
This has happened with almost every great American tech company. How often do we see the type of reasearch that came out of Bell Labs today? We don't, instead we see former reasearchers that were once considered the "cream de la cream" of computer scientists out looking for work (most taking up teaching positions at universities).
Along with the presure of Wallstreet, Microsoft will be releasing their direct competitor to Google soon and they will be pushing hard for industry domination.
Wallstreet is the reason that our tech jobs are going to India, Wallstreet is the reason that America is slowly becoming less and less of the technological superpower that it used to be.
IMHO, Google should stay out of Wallstreet and keep doing what it has been doing.
Then again, there are plenty of examples of companies that had alot of hype for an IPO and are still strong and innovating today, VA Linux Systems for example, oh, I mean VA Software, and their one product that is slowly being made obsoleete by Free and Open Source alternitives.
Those two points are irrelevant. Google is all about software. The hardware is whatever they can pick up cheap. You may be able to tell how integer or floating point their calcs are, but thats pretty useless to any other company as well.
As for their current commercial strategy, take over the world, I think we all know that.
That's because you (by you I mean we) are a geek. I wanna know too. So do most people here. Because we're geeks. Okay? We're fucking geeks. Dammit.
HI, MY NAME IS ISAAC.
The two examples he mentioned did not contain the phrase.
I don't know much about shares and stocks and stuff but I've heard complaints in previous threads about 'greedy wall street bastards' buying heaps of shares to make a profit. Why doesn't Google put a set a maximum number of shares limit per person/organization. Then it will encourage a more even distribution of wealth.
Read the original item, and try the search. I used quote marks around the phrase, and it still came up with bogus results.
This is clearly a bug. If you read Google's own documentation, it says that it returns pages actually containing the phrase. Not some useless "pages linked to" thing. The "pages linked to" is to determine ranking, not actual determined results.
"and my understanding is the anchor text is where "to be or not to be" can be found for these two pages you mention."
No, this proves that for this, and other phrase examples, Google does not fulful the purpose of the web search since two of the top 10 pages do not contain the phrase asked for and are thus error. Chaff. Trash links..
I don't plug code and do Google searches from that; I just do searches from www.google.com. I've noticed that Google's returned results are very buggy. Other search engines like Altavista have no problem with this: their returns are 100% accurate. If it is so easy that Altavista can do it, why not Google?
It's a good article, but the page as a whole is annoying, due to several animated ads. I won't put up with that shit. I copied the text to my word processor for reading.
FruitCompanyA uses insecticide whereas FruitCompanyB doesn't. I personally would choose FruitCompanyA as a a rise in the insect population would ruin FruitCompanyB.
Good example, but I guess it's my turn to be nit-picky. Growing up around apple orchards teaches you a lot about how to grow apples. One lesson learned is that the best way to stop insects is not to use insecticide, but to use other insects, aka natural enemies (ever wonder why NZ-grown apples are so popular around the world?).
Again a good example, as a good investor would have an understanding of where they are putting there money, otherwise they would be better of going to somewhere like Las Vagas and putting it all on a black-jack table.
Karma? Hey I just call it as I see it.
Results 1 - 10 of about 5,660,000,000 for the
When I search google for "the".
So, 90 million pages just vanished?
This issue is a bit more complicated than you think.
"Google cluster actually has 100,000 servers" "More than half the company's 1,000 employees are techies" => Googles has 100 servers/employee => Google is IT company
>
Actually, their means of generating cash flow relies on how beneficial advertisers feel it is to advertise on Google.
Buy Steampunk Clothing Online!
Which is directly tied to their "brand". Google is a household name because they provide fast, relavent search results with a clean interface and relavent ads. This is only possible because of the hardware and software they run. It is what made them famous, and it is what keeps them up front. Their hardware and software is critical to their livelyhood.
This is not the case with a company like General Elecrtic.
=Smidge=
Any idea what symbol Google plans to trade under? Is Google-IPO.com the best source for news?
``Tension, apprehension & dissension have begun!'' - Duffy Wyg&, in Alfred Bester's _The Demolished Man_
Out of curiousity does anyone know if that is a Metric Billion or US Billion?
The difference being:
Metric Billion is 1 million, million ie:
1,000,000,000,000
Whereas US Billion is 1 thousand million ie:
1,000,000,000
A fair order of magnitude in difference! Also Metric Billion is also referred to as a Mathematic Billion. The US Billion is also referred as a European Milliard.
It's in that place where I put that thing that time
Having a PhD does NOT make one smarter, it just means that said person found (or has) the financial means to become more educated...
Sure, anyone can buy a PhD from a diploma mill. That's not very hard. But reputable PhDs, the kind that Google would want to hire, have had to learn a lot of material, prepare a thesis, successfully defend that thesis and demonstrate their broad as well as in-depth knowledge of the subject. But I might be out to lunch on this one -- why don't you buy yourself a PhD for $199 and apply for a job at Google. Let us know how the interview goes, ok?
Want to improve your Karma? Instead of "Post Anonymously", try the "Post Humously" option.
Yes, but why couldn't the response simple be something like. We believe we currently have the computing capacity to handle Y many hits per second. It is evenly distributed in X locations with the destruction of i of those facilities leaving us with X/i percent of Y many hits. We can add additional hardware at $Z/10000 hits.
Nothing in the information you asked for, other than the peak load they can handle, requires them to answer how many machines, what each machine can do etc..
If you liked this thought maybe you would find my blog nice too:
there's an older but great mp3 of how google is set up at ddj's technetcast website. The speaker is Jim Reese, Chief Operations Engineer at google.
Link
PS. On that website, I think the link to the mp3 doesn't work, but if you manully ftp into the server and get the file manually, it's fine.
Because Wal-Mart does not force its workers to join political organizations. Costco does.