Larry Page: Google Was an Accident
DarklordJonnyDigital writes "Ars Technica is reporting that Google founder Larry Page has admitted that the Google project wasn't originally intended to be a search engine at all. "It wasn't that we intended to build a search engine. We built a ranking system to deal with annotations." ' Of course, happy accidents have often been the cause for advancement, technologically or otherwise.
many great inventions/discoveries are accidentally invented/discovered.
Newton's Law, gravity constant, etc
Archimedes' buoyancy Law
Accident or not, I'm glad it happened. Search engines at that time left much to be desired. Google was simply magic. If I wanted something, it would magically appear on the first link.
The guys who created the Expand Accellerator were actually trying to develop a new encryption method when they stumbled across a method to increase virtual bandwidth.
Jerry Yang's original set of links was a Sumo wrestling enthusiast's page...that for a time was valued at $120 billion dollars (!).
Here in Google groups..
Now can someone find the first mention of searching Google looking for the first mention of Google in Google?
Not really, previous search engines did well what they were intended to do. They searched the web focusing in each site as isolated in the web.
But used the wrong point of view, they didn't see the web so interlinked that searching based in how much linked a site is could be a measure of how much desirable could be find that site.
Sometimes the better solutions are just viewing a hard problem from another point of view.
After reading all the information on google's technogoly, I wonder how many lines of code pagerank really is.
Do you figure it's 50k+lines, or something very simple, and only a few hundread lines
For some reason, I don't think the pagerank algorithm is more than 1000 lines of code... I know lines of code isn't really a defining characteristic of anything, but it's still interested...
From the article
Larry Page: "It wasn't that we intended to build a search engine. We built a ranking system to deal with annotations. We wanted to annotate the web--build a system so that after you'd viewed a page you could click and see what smart comments other people had about it. But how do you decide who gets to annotate Yahoo? We needed to figure out how to choose which annotations people should look at, which meant that we needed to figure out which other sites contained comments we should classify as authoritative. Hence PageRank.
"Only later did we realize that PageRank was much more useful for search than for annotation..."
Now think about blogging with page ranking applied. Might be much more useful than normal blogging. As search engines with PageRank are compared to normal search engines.
Bye egghat.
-- "As a human being I claim the right to be widely inconsistent", John Peel
Mmmm I should check Google Labs before saying something that looks so obvios, they already doing it in Google WebQuotes
I wonder how many lines of code pagerank really is.
Try one equation, iterated a few times:
However, the PageRank value is only one aspect of Google's ranking; for brand-new pages that haven't had time to gather links yet, Google seems to use straight textual ranking.
Will I retire or break 10K?
Google's not only an accident, but also a misspelling: It should be googol.
Although I'm kinda glad it got misspelled though, because google is much cooler that googol.
Interesting googol fact from whatis.com:
Later, another mathematician devised the term googolplex for 10 to the power of googol - that is, 1 followed by 10 to the power of 100 zeros. Frank Pilhofer has determined that, given Moore's Law (which is that computer processor power doubles about every 1 to 2 years), it would make no sense to try to print out a googleplex for another 524 years - since all earlier attempts to print a googleplex out would be overtaken by the faster processor.
Who said Freedom was Fair?
...the Information Retrieval (IR) geeks reckon there's 2 major factors. You are correct that one of those is relevance, which is known as precision. And the other is recall. Think of recall as getting all the relevant results.
One of the tricks that can be used to cull irrelevant results is to cut down the total number of results. The IR dudes quickly started playing the numbers. Showing the best 20 results is better than showing the top 100 with 60 of those being irrelevant.
I like to think of these as accuracy and completeness.
I used to occasionally browse through TREC. Seems like they have locked up the past results nowadays...
So does Anonymous Coward have good karma?
A friend of my brother-in-law was suprised to hear that there were other search engines in existance.
He thought that Google was just a standard, like HTML, FTP, Gopher, or NNTP.
That was quite the little accident they had.
Google have a top-notch system but the whole indexing thing is still laughable. They are not really taking advantage of structured markup in evaluating keywords - they extract the same information as if it were a plain text file sans markup. Yeah, sometimes top-level headers and link text is used, but that's it really.
Its good, however, to see that Google aren't resting on their laurels, as Google Labs amply demonstrate. I like Google sets, which makes good use of list markup, like when the shuttle crashed last week I was trying to remember the names of all the space shuttles, so entering Colombia, Challenger and Enterprise into Google Sets gave me the names of the other three shuttles, Discovery, Endeavour and Atlantis -- a useful tool indeed.
Considering Google's purchase of Blogger announced this past weekend, I'm looking forward to more semantically based search abilities - since blogs are by their nature very structured (especially those with RSS or XML feeds).