300 Years to Index the World's Information

← Back to Stories (view on slashdot.org)

300 Years to Index the World's Information

Posted by Zonk on Sunday October 9, 2005 @09:27AM from the everything-else-to-be-destroyed dept.

Kasracer writes "At the Association of National Advertisers annual conference, Google's CEO, Eric Schmidt suggested that it would take 300 years for them to index all of the world's information. From the article: 'We did a math exercise and the answer was 300 years,' Schmidt said in response to an audience question asking for a projection of how long the company's mission will take. 'The answer is it's going to be a very long time.'"

14 of 248 comments (clear)

Min score:

Reason:

Sort:

The major question is by the-amazing-blob · 2005-10-09 09:29 · Score: 1, Interesting

Does that estimate include how much pocket lint I have?

Seriously, though, why would anyone want to index all the info in the world? That's kinda weird, in my opinion.
1. Re:The major question is by michaeltoe · 2005-10-09 09:58 · Score: 2, Interesting
  
  Because once it's all there, you don't have to look for it anymore.
i hereby propose by circletimessquare · 2005-10-09 09:38 · Score: 1, Interesting

i hereby propose a new measurement for time: the google year

as we can measure disk space in libraries of congress,

and measure distance in light years...

a google year will be thus: the 300 year span it will take google to morph from geek-friendly search engine to big brother overlord who knows more about you than you do yourself

for example: it has been (2005-1492)/300 = 1.71 google years since columbus first set foot in america

and it will be 1 - (2005-1998)/300 = 0.9767 google years until your great-great-great-great-great-grandchilren will be nothing more than information slaves to the great google dominion

--
intellectual property law is philosophically incoherent. it is your moral duty to ignore it or sabotage it
Re:What About... by htrp · 2005-10-09 09:41 · Score: 5, Interesting

I would assume that it would be to index the collective sum of information, even as it is growing. It's probably a lot quicker to index something than it is to generate it. With probable future advances in computing power and the development of new algorithms, it should be entirely possible that the speed of indexing (which already probably surpasses the speed of information production) would catch up to all the data that still hasn't been indexed.

Think of it in terms of taking a ratio comparison of two infinite series.
On a related note... by RyanFenton · 2005-10-09 09:42 · Score: 4, Interesting

I wonder how many man-years it would take to listen to all the music and video that could be indexed. Be interesting at least to find out what the order of magnitute would be - millions, or perhaps billions or trillions of man-years of unique recorded audio and video? It would have to be a game of gross estimation - but it would at least put into perspective how much material is out there, even if most of it is boring "security" footage, compared to the scope of our lives.

It'd be interesting, if, perhaps in a couple generations, we could have a cheap media volume that contained "recorded media, prehistory - to - 2050ad"... if the media that exists today even survives a couple generations, and copyrights aren't extended indefinetly. The idea of an indexing system that can even put all that information into a meaningful context would be fascinating to consider though, if it could be possible.

Ryan Fenton
Competition? by psst · 2005-10-09 09:44 · Score: 4, Interesting

From the article:
Of the approximately 5 million terabytes of information out in the world, only about 170 terabytes have been indexed, he said earlier during his speech.
Storing 5 million terabytes has got to cost a lot of resources. It would be very inefficent if every competing search engine stored that much data. Makes me wonder if it would make more sense to nationalize Google's index and share it amongst competitors (just like it makes more sense for goverments to build airports and share them amongst airlines rather than every airline building its own airports).
1. Re:Competition? by Halfbaked+Plan · 2005-10-09 12:00 · Score: 4, Interesting
  
  Oh, come on. You're talking about a company that is mostly an advertising enterprise now. Who is Google hiring? Admen and their ilk. It's sometimes depressing how enamored the 'community' had become in a company whose main purpose is leveraging eyeballs to look at their ads.
  
  (how DARE I say anything bad about Google. Mod this down IMMEDIATELY.)
  
  --
  resigned
I'm curious... by DeepBlueDay · 2005-10-09 09:46 · Score: 2, Interesting

How is 'information' defined in this context? Is a thirteen-year-old girl's blog considered information?
1. Re:I'm curious... by Vellmont · 2005-10-09 12:50 · Score: 4, Interesting
  
  I think the parents question is perfectly valid. What is considered "information"? I'd consider a blog information, but is a painting some random artist creates included in this list of "information"? Is my laundry list information? How about my individual handwriting in my laundy list?
  
  The question of is something valuable isn't exactly an either-or proposition, but a matter of assigning a probability that a certain piece of information is valuable. Couldn't we agree that say the presidents day to day activities are more likely to be important in 100 years than say a single 13 year olds blog? Does that mean that 13 year olds blogs are worthless? Well no, but they aren't the thing I'd first choose to preserve.
  
  The question I have is, is the greater difficulty in control over online information balanced by the greater ease of keeping it around? Google doesn't delete messages from email for this very reason. We tend to throw stuff away because it takes up too much space, or because it just becomes clutter. But with increased storage space every year and better ability to keep track of it (and seperate it from things we consider important), why ever throw away information?
  
  Online information portability is obviously a problem. How do you move someones blog somewhere else, and have it mean anything in say 50 years? I think these problems will be solved as people expect information to be more portable and standardized. The solutions I think will come from the short term portability and needs rather than a few people wanting to preserve something for the next 100 years though. Many people make the assumption that standards are short lived things that are here today, gone tommorow. I'd have to disagree on a historical basis. How old are reel to reel tapes, and you can still find a player at say a thrift store. CD-audio has been around for 25 years and is still the default medium for music today. Ascii has developed I don't know how long ago and yet still is quite popular and if you have a computer that can't read it, you've got a fairly useless computer. Standards have a way of sno-balling and gathering momentum to live on a long time.
  
  --
  AccountKiller
Not the Moore model but the Bono model by tepples · 2005-10-09 09:52 · Score: 2, Interesting

No, the proper model is not Moore's law but Bono's law. If it takes 300 years now, then it'll take 320 years in 20 years, and most of the time will be spent waiting for exclusive rights to expire (if they ever do). For instance, indexing a literary work that's out of print and not widely available at libraries requires getting a new copy, and those aren't available until the copyright runs out.
webcams and other continuous data collectors by G4from128k · 2005-10-09 10:21 · Score: 3, Interesting

This analysis must exclude entire categories of continuous data collection devices such as webcams, data loggers, OS log files, sensing equipment etc. All jokes aside about porn on webcam's, I can imagine that future historian would love such a rich data source on how people lived their lives, what they have in their surroundings, etc.
The point is that many current systems spew a huge volume of low value (but nonzero value) data (multiple MB or GB/day/device). The lack of storage means most of this is not captured and is thus never indexed.
Even massive companies can't keep all their data. Wal-Mart stores on the order of 460 TB in their data warehouse, but only has room for the last 13 months of data or so. At 138 million customers per week, they only have room for a paltry 59kB per customer per week.

--
Two wrongs don't make a right, but three lefts do.
There are a lot of areas where essentially this... by hackwrench · 2005-10-09 11:06 · Score: 2, Interesting

question is asked, and they seem to miss that the answer is that it is it's own index.
Re:What About... by Max+Nugget · 2005-10-09 12:19 · Score: 3, Interesting

Did they take into account the information that is being created as they are indexing? Do they plan on live indexing everything that's being made. Information doesn't stop getting created just because they've stored everything that's already been done.

Funny you mention that. In some versions of Superman, Brainiac, a living computer whose mission is to gather all information about every planet in the universe, entered into the world of villainry because he logically reasoned that the only way he could ever "complete" his mission would be to gather information about each planet and then destroy the planet, since allowing the planet to continue existing would result in a never-ending cycle of new information that would need to be recorded, making it impossible to ever reach a "done" state. Not surprisingly, then, Brainiac's goal is ultimately to destroy the entire universe. :)
Re:Longer than expected by Almost-Retired · 2005-10-09 15:04 · Score: 2, Interesting

42 years, from Douglas Adams HHGTTG? Yes, I expect it will be enough since storage and computer power growth will foreshorten his estimated 300 years. But one possible constraint might exist, that of finding the energy to power all that, and to cool it. But who knows what we'll be using to add 2&2 15 years from now, I don't & won't because I'll probably be returning to dust by then, although some of the whatif press sure seems positive.

On a side note, since they are restricted to doing verbatum the works that are out of copyright, how about we start lobbying our reps to pass a law that says if the material is rights protected by some encodeing where the DMCA prohibits the defeat, and there is not an AUTOMATIC expiration of the restrictions based on the time when the material would pass into public domain, then such material, since it can never pass into the public domain without violating the DMCA, is to have no copyright protections under the copyright laws whatsoever. After all, if it cannot pass into the public domain without breaking the DMCA restrictions, it will never pass into the public domain.

Such material should be granted a copyright ONLY if it can legally pass into the public domain at the end of the copyright period. Put the RIAA and MPAA on notice that they can have their cake ONLY if they don't eat it. One or the other but not both.

FWIW, I do not consider the maintainance of a securely vaulted, unprotected copy of the work to be a valid defense unless this copy is transfered absolutely verbatum, to whatever lossless media is the currently used favorite about every 5 years so that it would become available and usable on the equipment of the time when the copyright does run out, along with suitable high penalities for not meeting their obligations under the copyright statute.

Make it a part of this proposed copyright addendum that the continuence of the copyright is contingent on the court, at someones request, requireing they trot out the equipment in common use at the time, and perform or otherwise show the court that they have an unrestricted copy instantly available in case its copyright should end that day. If they cannot do this, then the DMCA is null and void for that work and the copyright is terminated instantly.

And, the copyright holder going bankrupt immediately causes the material to become public domain since there will then be no one to assure the copyright statute is observed and obeyed. Bankruptcy is too often used as a means to transfer such "property" in such a manner as to cause the ownership trail to be so obfuscated that there is no one in authority to see to it the copyright statute obligations vis-a-vis the transfer to public domain will ever be done. Remove that glaring loophole and quite a few bankruptcies will be stopped.

What say the rest of the /.ers here? Can we do it? Write your reps, on paper, expressing your views on the subject & lets see what happens...

--
Cheers, Gene
"There are four boxes to be used in defense of liberty:
soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
99.35% setiathome rank, not too shabby for a WV hillbilly