300 Years to Index the World's Information
Kasracer writes "At the Association of National Advertisers annual conference, Google's CEO, Eric Schmidt suggested that it would take 300 years for them to index all of the world's information. From the article: 'We did a math exercise and the answer was 300 years,' Schmidt said in response to an audience question asking for a projection of how long the company's mission will take. 'The answer is it's going to be a very long time.'"
Did they take into account the information that is being created as they are indexing? Do they plan on live indexing everything that's being made. Information doesn't stop getting created just because they've stored everything that's already been done.
"Plans are for fools! Oglethorpe, the plutonian (Aqua Teen Hunger Force)
I agree. At Google's scale and beyond, the concept of 'information' is such a wooly one.
How the hell did they come to that figure of 300 years?
I'd like to see their definition of information. Certainly, a lot of things that are already of common interest are on the net. Occasionally, I find things that aren't available online but the greatest majority of the time google is able to find what I want.
To further the example: at work we have several filing cabinets that haven't been opened in years. There are lots of papers and stuff in there, I can vouch for that. Some might consider it "information." But in reality all that stuff could be burned and I doubt it would make the slightest difference in the way the future rolls out. None of it is stuff that would ever be needed by an IRS audit or anything like that either. Does google consider this kind of stuff as part of their efforts? Because I think they can safely ignore it.
We did a math exercise? What exercise?
To estimate the time involved, you surely need to know the size of the information involved (don't quote me that bunkum about 170 terabytes in TFA - yes I did read it), and to know the size you need to know what all the information is, which you can't (and surely new information is created all the time?).
This translates as "I pulled my finger out my ass, waved it in the air and came up with 300 years."
Only if it includes her home address.
Mother, do you think they'll like this sig?
What question is that? What happens inside a woman's head?
Mother, do you think they'll like this sig?
"We did a math exercise and the answer was 300 years," Schmidt said in response to an audience question asking for a projection of how long the company's mission will take. "The answer is it's going to be a very long time."
Since this was in response to an audience member's question, does anyone else think he was joking? Because it is such an outlandish question from an information theory and modeling point of view, perhaps he was mocking it? "Ah yes, we just came up with an equation and it should take 294.59 years." I think this also makes sense in light of his next comment, which was made on a more serious note. I interpret it, "We really didn't use an equation, it will obviously take a long time though." This is how I understod his comments, and I may be wrong, but it wouldn't surprise me if some reporter picked up on this "joke" and put it up as "news".
Nationalize Google? Are you joking me or just insane? You want to take one of the most innovative and successful companies that the US has right now a nationalize it!?
I have a better idea, how about you just send out a government hit squad to kill to put a bullet between the eyes of single entrepreneur in the US. It will accomplish the same sort of freeze in the growth of innovative small businesses but look far less insane.
Like Anne Frank's?
Fact is, it's incredibly hard to determine today what will have value tomorrow. Most of those thirteen year old girls (or 20-something geek guys) blogs will have no historical value. But some of those people will grow up to have a profound impact on the world (or they may not grow up, but still have a profound impact, as was the case with Anne Frank). It may be ten years from now. Or 50.
Who knows what the writing they do now might tell us about what brought them wherever they end up? When people write diaries on paper chances are reasonable they'll survive and show up in an attic somewhere. But as more and more content get online, we also risk facing the loss of entire generations worth of many types of information to bit rot and simple lack of foresight.
"Of the approximately 5 million terabytes of information out in the world, only about 170 terabytes have been indexed, he (Schmidt) said earlier during his speech."
So ... how many terrabytes of info will be produced in the next 300 years, and does anyone really think that Google (and anyone, or everyone) could keep up?
Especially, once all 20 billion people who live in the Solar System are video-documenting every moment of their existence ...
OK, so I project and exaggerate ...
i read the article, and this is what I got from it. i could be wrong.
-5 million TB of data.
-170 TB have already been indexed.
-it would take 300 years to index that data and make it searchable.
I don't think it's an exercise to index all knowledge. As you point out, that would be alogical. I think it's more of an understanding of what it would take to effectively and completely serve the world's information needs given current indexing capabilities.
I guess establishing a benchmark currently, both of how efficiently they index information, as well as a general number for the amount of data is out there, they can gauge how efficient they get relative to the rate at which the amount of potentially indexable data increases.
un burrito me trampeó.
Practice has shown that government ownership and operation of airports is inferior to private ownership.
Contribute to civilization: ari.aynrand.org/donate
Look, the problem is not how much data there is in the world, the problem is finding a general automatable algorithm for organizing it in such a way that J. Random User can rapidly find what he's looking for.
Stroll on down to the nearest university library. It's got a lot less information in it that Google is considering, and aboutt a hundred thousand man-years over a few centuries have gone into finding clever ways to organize it all: card catalogs, shelving systems (e.g. Dewey and his decimals), nowadays searchable electronic catalogs, reference books, specialized indices for law and science and medicine, citation indices, reviews, reviews of reviews...and so on and so forth forever. And yet, it can still be immensely difficult to track down a particular piece of information you want. Even if it can be done, often it takes a fair amount of expertise in a field just to know where to look. Where do you find public information on patents for desalination processes? How do you find out if anyone has synthesized a polymer resin that melts between 130 C and 150 C and is resistant to acid, with a tensile strength about X? What was the common law meaning of "ownership in fee simple" in 1680s England? Even to start looking for the answers, you often need great experience in the relevant field, so you know where to start looking -- the "search terms" we might say.
Google may be feeling its oats because they can now very rapidly provide the most obvious things people want -- directions to San Diego from Ukiah, the times and places Serenity is playing on Sunday, the lead story of the New York Times "Style" section last Sunday, or the names and addresses of the six pizzerias closest to me still open at 11:25 PM.
But this is utterly small potatoes compared to the problem of organizing information generally, so that it is useful to professionals during the weekday as well as for amusement on the weekend. It is first, generally speaking, an unsolved problem -- no library or information index I've ever used fails to have at least one frustrating "feature" that leaves me scratching my head, wondering what the heck the designers were thinking. Secondly, I very much doubt Google has the depth of professional expertise in-house to even begin to figure out how to organize all the giant repositories of information in law, science, engineering, literature et cetera in such a way that professionals can use them, let alone amateurs.
And finally, they don't have the money to do it, and it will be very hard for them to raise it. Indices have suffered from this problem for a long time: any given user will only pay a very small price per search, but it costs a huge amount to make the index. Heretofore, makers of indices and dictionaries and references have relied on selling them at very high prices to libraries, which in turn raise the money in small bits from their patrons, or taxes. But Google would cut out the library middleman -- you search directly. So how are they going to cover their costs? They've no easy way to charge you $0.005 every time you do a Google search, for example.
In short, this sounds like the 21st century equivalent of that 1950s nuclear energy braggadocio, "energy too cheap to meter." Call it "information too cheap to meter." Color me skeptical.