300 Years to Index the World's Information

Longer than expected by powerpuffgirls · 2005-10-09 09:28 · Score: 5, Funny

I always thought 42 years ought to be enough.

Re:Longer than expected by Anonymous Coward · 2005-10-09 10:50 · Score: 3, Funny

64 years should be enough for anyone.
Re:Longer than expected by Almost-Retired · 2005-10-09 15:04 · Score: 2, Interesting

42 years, from Douglas Adams HHGTTG? Yes, I expect it will be enough since storage and computer power growth will foreshorten his estimated 300 years. But one possible constraint might exist, that of finding the energy to power all that, and to cool it. But who knows what we'll be using to add 2&2 15 years from now, I don't & won't because I'll probably be returning to dust by then, although some of the whatif press sure seems positive.

On a side note, since they are restricted to doing verbatum the works that are out of copyright, how about we start lobbying our reps to pass a law that says if the material is rights protected by some encodeing where the DMCA prohibits the defeat, and there is not an AUTOMATIC expiration of the restrictions based on the time when the material would pass into public domain, then such material, since it can never pass into the public domain without violating the DMCA, is to have no copyright protections under the copyright laws whatsoever. After all, if it cannot pass into the public domain without breaking the DMCA restrictions, it will never pass into the public domain.

Such material should be granted a copyright ONLY if it can legally pass into the public domain at the end of the copyright period. Put the RIAA and MPAA on notice that they can have their cake ONLY if they don't eat it. One or the other but not both.

FWIW, I do not consider the maintainance of a securely vaulted, unprotected copy of the work to be a valid defense unless this copy is transfered absolutely verbatum, to whatever lossless media is the currently used favorite about every 5 years so that it would become available and usable on the equipment of the time when the copyright does run out, along with suitable high penalities for not meeting their obligations under the copyright statute.

Make it a part of this proposed copyright addendum that the continuence of the copyright is contingent on the court, at someones request, requireing they trot out the equipment in common use at the time, and perform or otherwise show the court that they have an unrestricted copy instantly available in case its copyright should end that day. If they cannot do this, then the DMCA is null and void for that work and the copyright is terminated instantly.

And, the copyright holder going bankrupt immediately causes the material to become public domain since there will then be no one to assure the copyright statute is observed and obeyed. Bankruptcy is too often used as a means to transfer such "property" in such a manner as to cause the ownership trail to be so obfuscated that there is no one in authority to see to it the copyright statute obligations vis-a-vis the transfer to public domain will ever be done. Remove that glaring loophole and quite a few bankruptcies will be stopped.

What say the rest of the /.ers here? Can we do it? Write your reps, on paper, expressing your views on the subject & lets see what happens...

--
Cheers, Gene
"There are four boxes to be used in defense of liberty:
soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
99.35% setiathome rank, not too shabby for a WV hillbilly

New hardware needed by nizo · 2005-10-09 09:33 · Score: 4, Funny

The hardest part will be developing the hardware that is able to recursively index the Google data itself an infinite number of times.

--
I Am My Own Worst Enemy

Re:New hardware needed by spuzzzzzzz · 2005-10-09 11:11 · Score: 5, Funny

It's OK, they use linux. It does infinite loops in 5 seconds.

--

Don't you hate meta-sigs?

What About... by Adrilla · 2005-10-09 09:33 · Score: 4, Insightful

Did they take into account the information that is being created as they are indexing? Do they plan on live indexing everything that's being made. Information doesn't stop getting created just because they've stored everything that's already been done.

--

"Plans are for fools! Oglethorpe, the plutonian (Aqua Teen Hunger Force)

Re:What About... by antdude · 2005-10-09 09:39 · Score: 2, Insightful

And information that are deleting.

--
Ant(Dude) @ Quality Foraged Links (AQFL.net) & The Ant Farm (antfarm.ma.cx / antfarm.home.dhs.org).
Re:What About... by htrp · 2005-10-09 09:41 · Score: 5, Interesting

I would assume that it would be to index the collective sum of information, even as it is growing. It's probably a lot quicker to index something than it is to generate it. With probable future advances in computing power and the development of new algorithms, it should be entirely possible that the speed of indexing (which already probably surpasses the speed of information production) would catch up to all the data that still hasn't been indexed.

Think of it in terms of taking a ratio comparison of two infinite series.
Re:What About... by barum87 · 2005-10-09 12:05 · Score: 2, Insightful

Most of the information we are creating now is already electronic; therefore it's a lot easier and less time consuming.
Re:What About... by Max+Nugget · 2005-10-09 12:19 · Score: 3, Interesting

Did they take into account the information that is being created as they are indexing? Do they plan on live indexing everything that's being made. Information doesn't stop getting created just because they've stored everything that's already been done.

Funny you mention that. In some versions of Superman, Brainiac, a living computer whose mission is to gather all information about every planet in the universe, entered into the world of villainry because he logically reasoned that the only way he could ever "complete" his mission would be to gather information about each planet and then destroy the planet, since allowing the planet to continue existing would result in a never-ending cycle of new information that would need to be recorded, making it impossible to ever reach a "done" state. Not surprisingly, then, Brainiac's goal is ultimately to destroy the entire universe. :)

Re:The major question is by Anonymous Coward · 2005-10-09 09:34 · Score: 2, Insightful

I agree. At Google's scale and beyond, the concept of 'information' is such a wooly one.

How the hell did they come to that figure of 300 years?

300 years? by RonnyJ · 2005-10-09 09:35 · Score: 5, Funny

300 years? I'd have thought their other plan would have been a lot quicker.

I'd like my house indexed by obli · 2005-10-09 09:36 · Score: 5, Funny

How long until Google decides that your house is information? Just imagine an army of small robot spiders invading your home every night, registering the position, name and contents of every single object you own, making it searchable from house.google.com. Unless you nail a robots.txt to your front door, that is...

Re:I'd like my house indexed by jacksonj04 · 2005-10-09 10:30 · Score: 5, Funny

locate:keys | pocket
locate:phone | pocket
locate:underwear -girlfriend | rm

--
How many people can read hex if only you and dead people can read hex?
Re:I'd like my house indexed by WilliamSChips · 2005-10-09 11:28 · Score: 3, Funny

locate: girlfriend: no such thing for a slashdotter

--
Please, for the good of Humanity, vote Obama.

Everybody! by Slashdiddly · 2005-10-09 09:42 · Score: 5, Funny

Please stop creating new information and let Google catch up! You can resume later.

Yeah right.. by Klowner · 2005-10-09 09:42 · Score: 5, Funny

It's going to take them a hell of a lot longer than that, considering my car keys are always moving.

When I read the summary by colonslashslash · 2005-10-09 09:42 · Score: 5, Funny

I immediately thought of the Futurama episode - The Why of Fry - where the giant brains build the brainsphere and assimilate all the knowledge in existance, before attempting to destroy the entire universe so no new information can be added.

Googlesphere anyone?

--
She's built like a steak house, but she handles like a bistro....

On a related note... by RyanFenton · 2005-10-09 09:42 · Score: 4, Interesting

I wonder how many man-years it would take to listen to all the music and video that could be indexed. Be interesting at least to find out what the order of magnitute would be - millions, or perhaps billions or trillions of man-years of unique recorded audio and video? It would have to be a game of gross estimation - but it would at least put into perspective how much material is out there, even if most of it is boring "security" footage, compared to the scope of our lives.

It'd be interesting, if, perhaps in a couple generations, we could have a cheap media volume that contained "recorded media, prehistory - to - 2050ad"... if the media that exists today even survives a couple generations, and copyrights aren't extended indefinetly. The idea of an indexing system that can even put all that information into a meaningful context would be fascinating to consider though, if it could be possible.

Ryan Fenton

Competition? by psst · 2005-10-09 09:44 · Score: 4, Interesting

From the article:

Of the approximately 5 million terabytes of information out in the world, only about 170 terabytes have been indexed, he said earlier during his speech.

Storing 5 million terabytes has got to cost a lot of resources. It would be very inefficent if every competing search engine stored that much data. Makes me wonder if it would make more sense to nationalize Google's index and share it amongst competitors (just like it makes more sense for goverments to build airports and share them amongst airlines rather than every airline building its own airports).

Re:Competition? by Shihar · 2005-10-09 10:29 · Score: 4, Insightful

Nationalize Google? Are you joking me or just insane? You want to take one of the most innovative and successful companies that the US has right now a nationalize it!?

I have a better idea, how about you just send out a government hit squad to kill to put a bullet between the eyes of single entrepreneur in the US. It will accomplish the same sort of freeze in the growth of innovative small businesses but look far less insane.
Re:Competition? by Halfbaked+Plan · 2005-10-09 12:00 · Score: 4, Interesting

Oh, come on. You're talking about a company that is mostly an advertising enterprise now. Who is Google hiring? Admen and their ilk. It's sometimes depressing how enamored the 'community' had become in a company whose main purpose is leveraging eyeballs to look at their ads.

(how DARE I say anything bad about Google. Mod this down IMMEDIATELY.)

--
resigned

I'm curious... by DeepBlueDay · 2005-10-09 09:46 · Score: 2, Interesting

How is 'information' defined in this context? Is a thirteen-year-old girl's blog considered information?

Re:I'm curious... by Anonymous Coward · 2005-10-09 09:52 · Score: 2, Funny

The blogs of thirteen-year-old girls are examples of the recently discovered negative information. If more young girls can be encouraged to write this will actually reduce Google's workload.
Re:I'm curious... by Hogwash+McFly · 2005-10-09 10:05 · Score: 2, Insightful

Only if it includes her home address.

--
Mother, do you think they'll like this sig?
Re:I'm curious... by vidarh · 2005-10-09 10:37 · Score: 5, Insightful

I take it from that comment that you don't see much value in a thirteen year old girl's blog? What about a thirteen year old girls diary?
Like Anne Frank's?
Fact is, it's incredibly hard to determine today what will have value tomorrow. Most of those thirteen year old girls (or 20-something geek guys) blogs will have no historical value. But some of those people will grow up to have a profound impact on the world (or they may not grow up, but still have a profound impact, as was the case with Anne Frank). It may be ten years from now. Or 50.
Who knows what the writing they do now might tell us about what brought them wherever they end up? When people write diaries on paper chances are reasonable they'll survive and show up in an attic somewhere. But as more and more content get online, we also risk facing the loss of entire generations worth of many types of information to bit rot and simple lack of foresight.
Re:I'm curious... by Vellmont · 2005-10-09 12:50 · Score: 4, Interesting

I think the parents question is perfectly valid. What is considered "information"? I'd consider a blog information, but is a painting some random artist creates included in this list of "information"? Is my laundry list information? How about my individual handwriting in my laundy list?

The question of is something valuable isn't exactly an either-or proposition, but a matter of assigning a probability that a certain piece of information is valuable. Couldn't we agree that say the presidents day to day activities are more likely to be important in 100 years than say a single 13 year olds blog? Does that mean that 13 year olds blogs are worthless? Well no, but they aren't the thing I'd first choose to preserve.

The question I have is, is the greater difficulty in control over online information balanced by the greater ease of keeping it around? Google doesn't delete messages from email for this very reason. We tend to throw stuff away because it takes up too much space, or because it just becomes clutter. But with increased storage space every year and better ability to keep track of it (and seperate it from things we consider important), why ever throw away information?

Online information portability is obviously a problem. How do you move someones blog somewhere else, and have it mean anything in say 50 years? I think these problems will be solved as people expect information to be more portable and standardized. The solutions I think will come from the short term portability and needs rather than a few people wanting to preserve something for the next 100 years though. Many people make the assumption that standards are short lived things that are here today, gone tommorow. I'd have to disagree on a historical basis. How old are reel to reel tapes, and you can still find a player at say a thrift store. CD-audio has been around for 25 years and is still the default medium for music today. Ascii has developed I don't know how long ago and yet still is quite popular and if you have a computer that can't read it, you've got a fairly useless computer. Standards have a way of sno-balling and gathering momentum to live on a long time.

--
AccountKiller

300 Years? Feed Those Pigeons! by Comatose51 · 2005-10-09 09:47 · Score: 4, Funny

Obviously they're not feeding those pigeons enough. Time to buy some quality feeds Google. Maybe even slip in some uppers every now and then. If all else fails, maybe it's time to consider the parrot upgrade. They're a lot more expensive but their index/poop ratio is much better.

--
EvilCON - Made Famous by /.

what is considered information? by bwy · 2005-10-09 09:48 · Score: 3, Insightful

I'd like to see their definition of information. Certainly, a lot of things that are already of common interest are on the net. Occasionally, I find things that aren't available online but the greatest majority of the time google is able to find what I want.

To further the example: at work we have several filing cabinets that haven't been opened in years. There are lots of papers and stuff in there, I can vouch for that. Some might consider it "information." But in reality all that stuff could be burned and I doubt it would make the slightest difference in the way the future rolls out. None of it is stuff that would ever be needed by an IRS audit or anything like that either. Does google consider this kind of stuff as part of their efforts? Because I think they can safely ignore it.

I Call Bullshit by Anonymous Coward · 2005-10-09 09:49 · Score: 3, Funny

It's going to take 300 years to index the grammer and spelling mistakes on Slashdot alone.

Re:I Call Bullshit by TRS80NT · 2005-10-09 10:35 · Score: 3, Informative

It's grammar.

--
Lorem ipsum dolor sit amet.

Not the Moore model but the Bono model by tepples · 2005-10-09 09:52 · Score: 2, Interesting

No, the proper model is not Moore's law but Bono's law. If it takes 300 years now, then it'll take 320 years in 20 years, and most of the time will be spent waiting for exclusive rights to expire (if they ever do). For instance, indexing a literary work that's out of print and not widely available at libraries requires getting a new copy, and those aren't available until the copyright runs out.

Makes no sense by bobintetley · 2005-10-09 09:52 · Score: 4, Insightful

We did a math exercise? What exercise?

To estimate the time involved, you surely need to know the size of the information involved (don't quote me that bunkum about 170 terabytes in TFA - yes I did read it), and to know the size you need to know what all the information is, which you can't (and surely new information is created all the time?).

This translates as "I pulled my finger out my ass, waved it in the air and came up with 300 years."

Re:The major question is by michaeltoe · 2005-10-09 09:58 · Score: 2, Interesting

Because once it's all there, you don't have to look for it anymore.

Re:And the Winner is... by Fermatprime · 2005-10-09 10:12 · Score: 2, Funny

Well, it's just Zeno's paradox. Let's say it takes them 300 years to index all today's information, then another 150 to index all the new infomation, then another 75... By 2605, all information to that point will have been indexed by Google. Then they can start indexing the FUTURE.

--
I hate the one hundred and twenty character limit for signatures with an all-enveloping, all-destroying, incredible pass

webcams and other continuous data collectors by G4from128k · 2005-10-09 10:21 · Score: 3, Interesting

This analysis must exclude entire categories of continuous data collection devices such as webcams, data loggers, OS log files, sensing equipment etc. All jokes aside about porn on webcam's, I can imagine that future historian would love such a rich data source on how people lived their lives, what they have in their surroundings, etc.

The point is that many current systems spew a huge volume of low value (but nonzero value) data (multiple MB or GB/day/device). The lack of storage means most of this is not captured and is thus never indexed.

Even massive companies can't keep all their data. Wal-Mart stores on the order of 460 TB in their data warehouse, but only has room for the last 13 months of data or so. At 138 million customers per week, they only have room for a paltry 59kB per customer per week.

--
Two wrongs don't make a right, but three lefts do.

a small margin of error by CupBeEmpty · 2005-10-09 10:22 · Score: 3, Informative

I think it is important to remember that this was a math exercise not a serious study with predictive power. I remember several years ago people thought the human genome project was insane. They thought it would take hundreds of years to catalog our entire genome and cost some ludicrous numbers of trillions of dollars.

Then:

In 1999, the goal of producing a "working draft" seemed very far away, with less than 15 percent of the genome sequenced. If the accelerated goals had not already generated a sense of urgency in the consortium, a decision by the sequencing center leaders at a February meeting in Houston would. At the meeting, the leaders accepted Dr. Collins' challenge to ramp up their efforts to produce a "working draft" by spring of 2000. By January 2000, the centers were collectively producing 1,000 base pairs a second, 24 hours a day, seven days a week, and 2 billion of the human genome's 3 billion base pairs were sequenced by March. At a White House ceremony hosted by President Bill Clinton in June 2000, Dr. Collins and J. Craig Venter of Celera Genomics, which had carried out its own sequencing strategy, announced that the majority of the human genome had been sequenced. [from here

I tried to find the graph of speed over time because I have seen itseveral times. It shows the exponential increase in the speed of the project. Apparently there are many scientists that believe with techniques as they are now we could repeat the project in 2 years if we started over. The indexing of information could have a very similar timeline. Very slowly at first and then as technology and specific methodology develop off you go. So the truth is... this is a guess. I wouldn't put too much faith in it.

Re:10,000,000 years by Hogwash+McFly · 2005-10-09 10:25 · Score: 2, Insightful

What question is that? What happens inside a woman's head?

--
Mother, do you think they'll like this sig?

was he joking ? by flynt · 2005-10-09 10:27 · Score: 5, Insightful

"We did a math exercise and the answer was 300 years," Schmidt said in response to an audience question asking for a projection of how long the company's mission will take. "The answer is it's going to be a very long time."

Since this was in response to an audience member's question, does anyone else think he was joking? Because it is such an outlandish question from an information theory and modeling point of view, perhaps he was mocking it? "Ah yes, we just came up with an equation and it should take 294.59 years." I think this also makes sense in light of his next comment, which was made on a more serious note. I interpret it, "We really didn't use an equation, it will obviously take a long time though." This is how I understod his comments, and I may be wrong, but it wouldn't surprise me if some reporter picked up on this "joke" and put it up as "news".

Re:i hereby propose by b100dian · 2005-10-09 10:33 · Score: 5, Funny

...Google indexed it all in 6 days, and took a rest in the 7th...

--
gtkaml.org

There are a lot of areas where essentially this... by hackwrench · 2005-10-09 11:06 · Score: 2, Interesting

question is asked, and they seem to miss that the answer is that it is it's own index.

from a logical point of view by ronsta · 2005-10-09 11:10 · Score: 2, Funny

Please correct me if I am wrong, but wouldn't solving a problem like this create more information than you previously had? And wouldn't you have to index that information and so on and so forth?

Also, can someone explain to me how you even approach something like this from a mathematical model point of view? How did the 170 terrabyte number even come up? Aren't there different definitions for what constitutes 'information?' Also, who the hell spent their 20% on this problem when there was integral code for vital programs to write, such as Google Suggest and Google Suggest in Japanese?

PLEASE, SOMEONE EXPLAIN BEFORE I GO OFF INTO MORE OF A FLAMEBAIT RANT!

--

Martini Glasses

Re:The major question is by jupiter909 · 2005-10-09 11:13 · Score: 3, Informative

They take the rate of current indexing of data, then take the rate at which data is being added to the pile by looking at current trends and possible future trends of people hooking up to the net and adding to that pile, then take the rate at which their systems advance to do the indexing of that pile. They then pass those variables through a custom magic google app and wait a bit and then, tada, the answer 300 is spat out.

You need remember that they could be way off, if some major breakthrough in storage technolgy happens tomorrow all those figure would need be recalculated. At best it is a very very rough idea of how long it is going to take them to catch up to the worlds information and keep it in a current index.

My guess: by imsabbel · 2005-10-09 11:23 · Score: 3, Informative

Stuff like this (or years ago for LHC) is most likely following approach:

They astimated an amount of information that is "all information", like 480 000 Exabyte or so.
Then they look at their current capactity (storage and database cpupower) and just interpolate moore's law into the future and look when the demand will be met.

Of course, for stuff like the LHC that only interpolates 10-20 years into the future such a thing is possible, but 300 years? He should read up about the singularity...

--
HI O WISE PRINCE. WHT TOOK U SO DAM LONG?

Re:10,000,000 years by eonlabs · 2005-10-09 11:25 · Score: 3, Funny

Hey, check it out! I'm in the lead!!!! ^_^

--
I wouldn't consider the mad hatter mad. Just reality impaired. He sure can make a mean cup of tea.

Define "all" ... by jabberwock · 2005-10-09 11:25 · Score: 2, Insightful

Noted that:

"Of the approximately 5 million terabytes of information out in the world, only about 170 terabytes have been indexed, he (Schmidt) said earlier during his speech."

So ... how many terrabytes of info will be produced in the next 300 years, and does anyone really think that Google (and anyone, or everyone) could keep up?

Especially, once all 20 billion people who live in the Solar System are video-documenting every moment of their existence ...

OK, so I project and exaggerate ...

Scientific and Mathematical bunk by markdj · 2005-10-09 11:42 · Score: 2

The article gives us no facts that we can use to verify the claim. Without a definition of information and a definition of indexing one cannot take this for accurate. There are many definitions of information and except that used in "Information Theory", which is a message received and decoded to its original form, I don't know of any definition that has sientific or mathematical rigour. In fact, in my opinion, Information Theory is a misnomer and is more properly called Communication Theory since it is about getting a message properly communicated, NOT about whether its contents are useful. Additionally, information as understood by most comes in many forms and types and each may require different ways of indexing. Finally, aren't the indexes information that needs to be indexed? How do you keep from recursing?

Re:The major question is by NickFitz · 2005-10-09 11:43 · Score: 2, Funny

Seriously, though, why would anyone want to index all the info in the world? That's kinda weird, in my opinion.

New here?

--
Using HTML in email is like putting sound effects on your phone calls. Just say <strong>no</strong>.

Uh oh by harlows_monkeys · 2005-10-09 12:18 · Score: 2, Funny

Uh oh...someone needs to visit Applied Cryogenics and knock 700 years off Fry's timer then.

Re:The major question is by Kevin+Mitnick · 2005-10-09 12:37 · Score: 3, Funny

They're almost there

Re:But... by BewireNomali · 2005-10-09 12:58 · Score: 2, Insightful

i read the article, and this is what I got from it. i could be wrong.

-5 million TB of data.
-170 TB have already been indexed.
-it would take 300 years to index that data and make it searchable.

I don't think it's an exercise to index all knowledge. As you point out, that would be alogical. I think it's more of an understanding of what it would take to effectively and completely serve the world's information needs given current indexing capabilities.

I guess establishing a benchmark currently, both of how efficiently they index information, as well as a general number for the amount of data is out there, they can gauge how efficient they get relative to the rate at which the amount of potentially indexable data increases.

--
un burrito me trampeó.

I thought the answer was 42. by wcrowe · 2005-10-09 13:49 · Score: 3, Funny

Ask a stupid question...

--
Proverbs 21:19

False alternative by ChrisMaple · 2005-10-09 15:19 · Score: 2, Insightful

Private ownership of an airport does not mean that it would be owned by an airline. Even if an airport were owned by an airline, that does not mean it would serve only that airline. (It would not be in its best interest to do so.)

Practice has shown that government ownership and operation of airports is inferior to private ownership.

--
Contribute to civilization: ari.aynrand.org/donate

Indexing the Porn by ozTravman · 2005-10-09 15:32 · Score: 2, Funny

He didn't clarify that 299 years of that was indexing all the Internet porn sites.

Re:Oooo.... No it's the giant brains all over agai by jrockway · 2005-10-09 15:59 · Score: 2, Funny

Don't worry, I just ordered a pizza for I. C. Wiener.

--
My other car is first.

it's the definition of "index" that's a problem by Quadraginta · 2005-10-09 18:11 · Score: 2, Insightful

Look, the problem is not how much data there is in the world, the problem is finding a general automatable algorithm for organizing it in such a way that J. Random User can rapidly find what he's looking for.

Stroll on down to the nearest university library. It's got a lot less information in it that Google is considering, and aboutt a hundred thousand man-years over a few centuries have gone into finding clever ways to organize it all: card catalogs, shelving systems (e.g. Dewey and his decimals), nowadays searchable electronic catalogs, reference books, specialized indices for law and science and medicine, citation indices, reviews, reviews of reviews...and so on and so forth forever. And yet, it can still be immensely difficult to track down a particular piece of information you want. Even if it can be done, often it takes a fair amount of expertise in a field just to know where to look. Where do you find public information on patents for desalination processes? How do you find out if anyone has synthesized a polymer resin that melts between 130 C and 150 C and is resistant to acid, with a tensile strength about X? What was the common law meaning of "ownership in fee simple" in 1680s England? Even to start looking for the answers, you often need great experience in the relevant field, so you know where to start looking -- the "search terms" we might say.

Google may be feeling its oats because they can now very rapidly provide the most obvious things people want -- directions to San Diego from Ukiah, the times and places Serenity is playing on Sunday, the lead story of the New York Times "Style" section last Sunday, or the names and addresses of the six pizzerias closest to me still open at 11:25 PM.

But this is utterly small potatoes compared to the problem of organizing information generally, so that it is useful to professionals during the weekday as well as for amusement on the weekend. It is first, generally speaking, an unsolved problem -- no library or information index I've ever used fails to have at least one frustrating "feature" that leaves me scratching my head, wondering what the heck the designers were thinking. Secondly, I very much doubt Google has the depth of professional expertise in-house to even begin to figure out how to organize all the giant repositories of information in law, science, engineering, literature et cetera in such a way that professionals can use them, let alone amateurs.

And finally, they don't have the money to do it, and it will be very hard for them to raise it. Indices have suffered from this problem for a long time: any given user will only pay a very small price per search, but it costs a huge amount to make the index. Heretofore, makers of indices and dictionaries and references have relied on selling them at very high prices to libraries, which in turn raise the money in small bits from their patrons, or taxes. But Google would cut out the library middleman -- you search directly. So how are they going to cover their costs? They've no easy way to charge you $0.005 every time you do a Google search, for example.

In short, this sounds like the 21st century equivalent of that 1950s nuclear energy braggadocio, "energy too cheap to meter." Call it "information too cheap to meter." Color me skeptical.

Slashdot Mirror

300 Years to Index the World's Information

56 of 248 comments (clear)