How does Google do it?
Doc Tagle writes "With Google reportedly on the verge of going public, more and more people want to know what makes Google tick. The Observer, serves up the answers to our questions."
← Back to Stories (view on slashdot.org)
Google has been at 4.285 billion pages for more than three months straight. The count hasn't increased in a long time... The index is maxed.
Google has recently removed tens of thousands of "duplicate content" sites from its index - where "duplicate content" is as simple as being an affiliate site (e.g. Amazon) and having the same textual item descriptions as many other sites.
Google is now in the process of dropping millions of link records from its index, presumably to make room for more pages.
Google is wavering.
Gmail is a distraction, a venture into some other space to keep people from noticing that their search product is degrading.
May she last as long as possible...
A large infusion of cash from some scary-assed three letter agencies that would be very interested in a centralized repository of the tastes and proclivities of nearly everyone in the world connected to the Internet.
Sure would be nice to see some of that amazing tech coming back into the community...
I read the article and it didn't say much at all about how Google operated. Instead, it just said we don't know how they operate because they keep it secret. But maybe that was the point to begin with.
-Vic
The only thing it's missing now (IMO) is spellcheck and an online translator, which I'm sure they're already planning. I'm also looking forward to Gmail being open to the public. After they conquer these 3 thing, whats next.. Google ISP? Google National Army?
I lost a couple of sites from Google this month, presumably due to duplicate content; they were nearly verbatim clones of some of my other sites. The original sites are still there, the "clones" vanished from Google. As in, even if I search for those domains directly, I get nothing, where I used to get a cached copy of the sites. They've quite literally vanished from Google's database.
Can you back up your assertions that Google's index is full? It's a rather interesting theory, and perhaps an explanation for all the tweaking they've done lately.
"BSD: Free as in speech. Linux: Free as in beer. Windows 10: Free as in herpes." --Man On Pink Corner in #52607549.
One -- Slashdot seems to be into content-directed ads now... as google was my ad for this story.
Two -- If you want your pages indexed faster and more frequently, sign-up and place a google adsense ad on your page. Many webmasters believe that google is having to index so many adsense pages... that is difficult for google to add many more non-ad driven pages.
Just sign up for adsense and run it a couple of weeks while you build your site. After google has spidered your site well, then just drop adsense.
Good luck. I would love to hear any of your google-related tricks.
AC
Not just that, it seemed to me the entire article was based on 2 statistics that didn't add up. Statistics, I hasten to add, which don't even reflect the internal structure, and which could just as easily have come from an ISP grepping their logs and multiplying quite a few times.
I mean....surely once they've gone public, they'll be obliged to detail and list the sort of information that the article postulates about? The shareholders would be entitled to know how many servers google has, what their specifications are, and what their current commercial strategy is.....surely?!
Why would a shareholder care about server specifications? Investing is all about money. Read any quarterly report from a public company. Income statement, balance sheet, and cash flow are the primary interests on the numbers side as well as a general roadmap of where the company's heading. Warren Buffett doesn't care if each server has two 80 GB drives, or whether they have four 250 GB drives per server. The only thing that matters is that there are competent people to handle these kinds of "dirty details" that an investor doesn't give a rats ass about.
Take a look at the kinds of information you could expect from Google's quarterly reports.
Want to improve your Karma? Instead of "Post Anonymously", try the "Post Humously" option.
Putting on my computer scientist hat I would guess:
- instead of backup, hold data in multiple places at once
- use a "cascaded rsync" to trickle software changes to thousands of nodes
- then load software via NFS at node bootup
- use nodes just to store data; keep software in RAM for speed
Just a few thoughts.
GIMMEE would be nice. Well, nice for awhile and if they didn't get weird with it. Don't know if that could happen though, nature of man and all that philosophical stuff. Goes along with the current VoIP articles. They would dominate the net then if they implemented that. I know I would pay cash to them have a universal works great, any OS VoIP and no-spam, no commercial email service.
So far we know they have just a cubic load of servers, the most on the planet most likely with one private company. The government probably has more, but it's a mish mash of them, not near as sleek or coordinated, AFAIK. What COULD be next with them, practical cheap 50 dollar thin clinets that you could do a TON on, using distributed computing, from games to communication to running any business? With tech savvy like they got and their already established heavy hardware base and heavy committment to R&D, they could just 'splode with an extra 25 billion in cash all of a sudden from an IPO. OR, the money could get to them and they become just another weird company that forgets it's roots as "brains come first" and switch to "marketing crap comes first" like certain other unnamed megacorps do now.
Interesting times
There is plenty of evidence to suggest that Google has run out of docid's, hitting the 32-bit integer limit.
The best evidence is doing a search which returns results which say "Supplemental Result" next to them. That'll be coming from a second document store I'd guess.
much more frequent in Linux than in proprietary systems from Microsoft or Sun
Huh? Does it!? Since when? I like these throw-away lines the media people dish out. What is their basis for this statement? Even when they see Linux obviously succeeding, they dish out a statement like this.
I certainly don't have to patch my Linux boxes as frequently as my Windows boxes. Actually... no... wait, they're right! I only need to patch Windows once. Ctrl-Alt-Del -> Boot Debian CD.
-- main(s){printf(s="main(s){printf(s=%c%s%c,34,s,34
Searching for 'the' gives about 5,740,000,000 pages while they index 'only' 4,285,199,774 web pages... Anyone knows why?
Some comments on the linked article:
> it means you're all Linux users.
What is that - guilt by association?
>how do you implement security patches and operating-system upgrades (much more frequent in Linux than in proprietary systems from Microsoft or Sun) on thousands of servers without causing disruption to service?
You don't implement any security patches and upgrades because those systems are used only by Web servers; it's not like some Web client will hack into their servers... You boot thousands of servers from NFS or such; you upgrade system images once a quarter, together with Google's own software.
>yet achieves 100 per cent uptime.
Uptime of what? Of www.google.com, using round-robin load balancing to several geographically dispersed data centers. What's the big deal about that?
But I've seen 404 on www.google.com and the paid AdWords Admin Web is down quite often(anyone who ever used it knows what I'm talking about).
I think this is the wrong question investors need to be asking about Google before they IPO. Sure, it makes for some great geek gab; the fetishistic wonderment of just how many servers Google is running, how many hits they get and how exactly they manage to, well, manage that many servers. In the end though, answering those questions doesn't tell us anything about what Google is actually selling.
.com heyday. While Yahoo was busy playing in Hollywood and becoming a "Portal" and Alta Vista was going down the tubes, Google's simple, whimsical, easy to use front page didn't get gaudy by trying to make us sign up for accounts or any of the other marketing department crap. Finally, Google has a high Willy Wonka factor, sort of like Apple. We don't hear much from the company in the way of press releases or other information, but every so often, they open the doors and it turns out the PhD Umpa Lumpas there developed something totally cool. Local search, Froogle, gMail and Okurit are examples of this...
The more and more I look at it, the more and more I fear Google is just nothing more then a very well calculated shill game; the Enron of technology IPOs...
Pretty much everyone who uses the internet loves Google and we do so for a combination of three compelling reasons; First off, Google offers up what is basically the best search engine on the internet. It isn't perfect, it doesn't work all the time but it is the best thing out there right now. Second, they offer this high-quality search service without all the excess bullshit that got tacked onto all of the other search engines on the market in the
The thing that gives me the heeby geebies about Google is how they make all of this look so effortless. Okurit just sort of popped out of the open one day. gMail appeared on April 1 with such an "effortless" air about it all that Google didn't even bother to take the press release seriously. We keep hearing these cryptic references from the company about some overwhelmingly massive amount of computing power they have and how their kabul of PhDs has it humming along with levels of efficiency that are a world beyond most everything else out there.
All of this has made for a very pumped up environment for an IPO, but we still have yet to get an answer to the question "What is Google's business model?" I "google" words all day. I have an Okurit account that I use. I even use Google as a quick and dirty calculator. When it opens up, I will have a couple of gMail accounts. The problem is, I've never paid these people a single penny for ANY of this. How the hell are they going to make money?
Sure, we can say that Google has integrated advertising within the search results, but the advertising model has always proven to be of dubious effectiveness at best. Google has an enterprise search division, but the cost of their Google Appliance is a pittance compared to the sort of money big time enterprise software companies like Oracle and SAP are making, how can they survive on that revenue stream and pay the bandwidth bills for all of the free services they offer to the public?
We always tend to answer these questions with an "I don't know, but Google must be doing something right." Google works very hard to continue to fuel the fire that they are doing something paradigm shifting with all of those PhDs they have on the payroll, and how many servers they have, and how they can just sort of effortlessly announce 1gb free email accounts. We keep drawing up the impression that these guys must have something HUGE up their sleeves, and they have us salivating for the IPO so we too can be part of it.
Very soon, Google executives are going to pile onto a Gulfstream V and do a roadshow for big time investment houses and institutional investors and they are going to be trying to convince these guys to buy Google IPO. They are going to be asked exactly what sort of business model Google is going to be pushing and one of two things is going to happen:
- Google will c
Man if you care about that crap you have some serious problems.
SafeSearch IS A FILTERED SEARCH. Shit happens this is the limits of technology, which is why you have to take extra steps to use it. Obviously it's going to have false hits, but that's life. Nothings perfect and it's NOT GOOGLE'S FAULT.
And the anti-jewish site?
Well, that's just plain bullshit. Some FUD spread by a reporter trying to get his name and his article spread around.
If you don't like it, use Altavista.
Akamai?
"When I visited the company in January, the screen said that Akamai was serving 591,763 hits per second, with 14,372 CPUs online, 14,563 gigahertz of total processing power, and 650 terabytes of total storage. On April 14 [2004], the number had jumped to a peak rate of 900,000 hits per second and 43.71 billion requests delivered in a 24-hour period."
From this article.
No, they have to have people who understand technical details to be able to produce legitimate forecasts of output. I'm sure there are people who analyize how many workers and robots Ford has to estimate how many cars they can produce, right? So the equvilent is how many coders and systems Google has, no?
Well if they don't, big brokerage houses can reply and I will consider the most lucrative offer.
espo
Searching for "the" returns:
Results 1 - 10 of about 5,690,000,000 for the [definition]. (0.11 seconds)
But Google homepage says:
Searching 4,285,199,774 web pages
Is there a difference?
Google is now in the process of dropping millions of link records from its index, presumably to make room for more pages.
It's possible that the index is full, but I would imagine that they would have seen this coming long ago, as it "filled up", and taken measures. What's more likely behind the elimination of duplicate pages is that more and more people have been complaining about the search results relevancy and how site owners have been taking advantage of certain known flaws in the Google algorithm. So, they are taking steps to fix the algorithm, and kill off all the fake sites.
"Who are in control, they are not in control of anything - they don't even control themselves!" - Glen Beck
Yes we need a google linux for the desktop. If google sells out, what a shame, Warren Buffet will have taken over the planet. The whole idea being - how to make a lot of money without working for it.
If work is defined as 'The expenditure of energy to the benefit of others that cannot be achieved by automation, then not many people accumulate money by honourable actions. In other words they get paid without doing something for it.
The lads at google have done a marvelous job, an all the greedy fat capitalist bastards want to do is stick their noses in the trough, an suck their sustenance. Whilst shagging the planet.
I can understand how in some cases an IPO can help generate revenue necessary to operate and break into new markets, but does this apply to Google? I really don't think so. They have market share; they have resources. Any infusion of funds to the company is more likely to give them the ability to further diversify and enter different markets, which history has shown is more often than not, a bad business idea.
So one has to assume the IPO is the first phase of the principals "cashing out". The press will probably signal this as a sign of the next dot com boom, and a bunch of nerds within the company will suddenly become millionaires, and subsequently quit their job and open up a Bed & Breakfast in some obscure town or join the World Poker Tour. There goes the talent.
Results 1 - 10 of about 5,750,000,000 for the [definition]. (0.11 seconds)
Doesn't that imply more than 4.285 billion?
This issue is a bit more complicated than you think.
Wallstreet should be seen for what it is: a plague upon american businesses and innovation.
You get your initial investment, which seems great, but then you sell your soul. You will be forced to "cut the fat" and "yeild higher short-term profits" and all resarch projects that make tech companies great will vanish.
This has happened with almost every great American tech company. How often do we see the type of reasearch that came out of Bell Labs today? We don't, instead we see former reasearchers that were once considered the "cream de la cream" of computer scientists out looking for work (most taking up teaching positions at universities).
Along with the presure of Wallstreet, Microsoft will be releasing their direct competitor to Google soon and they will be pushing hard for industry domination.
Wallstreet is the reason that our tech jobs are going to India, Wallstreet is the reason that America is slowly becoming less and less of the technological superpower that it used to be.
IMHO, Google should stay out of Wallstreet and keep doing what it has been doing.
Then again, there are plenty of examples of companies that had alot of hype for an IPO and are still strong and innovating today, VA Linux Systems for example, oh, I mean VA Software, and their one product that is slowly being made obsoleete by Free and Open Source alternitives.
Actually, their means of generating cash flow relies on how beneficial advertisers feel it is to advertise on Google.
Buy Steampunk Clothing Online!