The Math Behind PageRank
anaesthetica writes "The American Mathematical Society is featuring an article with an in-depth explanation of the type of mathematical operations that power PageRank. Because about 95% of the text on the 25 billion pages indexed by Google consist of the same 10,000 words, determining relevance requires an extremely sophisticated set of methods. And because the links constituting the web are constantly changing and updating, the relevance of pages needs to be recalculated on a continuous basis."
But 9,000 of those words are slang for parts of the human anatomy. Go figure.
I have sites with a PR of 6, and I can tell you that they got that way because of inbound links from other sites. In fact, when other sites dropped those links, my PR dropped (to 5, and even to 4). Getting more inbound links brought the PR back.
Think about those links, too. How often do you use common words in an HREF? I don't think there's a lot of weeding out of common words since the link to a site is usually either its name, or a description containing some important keywords.
I love seeing these technoscientists think they understand PageRank, but just like TimeCube, they're way, way off.
The article specifically says the PageRank eigenvector is only recalculated once a month, approximately. Even though Google uses some clever numerics to calculate the eigenvectors to a 25 billion by 25 billion matrix by iteration, it still takes several hours to finish.
It seems like it would be the nouns, pronouns, etc. that Google should be paying attention to. Who cares about all the verbs, adjectives, etc. that just muddy the indexing waters?
I read about this some time ago ... I think the paper was entitled "The 10 billion dollar Eignvector: The math behind google" or something to that effect. Sorry, but I've got a new laptop and cannot find the exact title. It was an excellent introduction for beginner computational scientists for an application of the eigenvector. I forget the American University responsible.
.
I skimmed the article and didn't find what I wanted to find. If you make a webpage that you want ranked high, what do you do? Do you make 100 geocities accounts and provide links to your main website, or what? I'm just wondering this out of curiosity, not out of need.
God spoke to me.
As a self proclaimed SEO expert - I honestly don't believe PageRank counts nearly as much as it did a few years ago! You'll find lots of PR5 sites ahead in the SERPS of PR9 sites!
LINUX ONLINE POKER: Linux Poker
I asked some math website to put a link to http://www.mathpotd.org/ Math Problem of the Day -- they don't bother to do so. They know the math and use it.
Because about 95% of the text on the 25 billion pages indexed by Google consist of the same 10,000 words, determining relevance requires an extremely sophisticated set of methods.
They use a set of nested if-else statements
*ducks*
SELECT advertiser, description, link, adcost
FROM tblAdvertisers
WHERE adword LIKE %searchstring%
ORDER BY adcost
Interestingly enough, google thinks so, too.
Of course, yahoo has its own opinion.
Although, altavista seems to almost agree. Check the second non-advertised result.
I do find this amusing though. Third place, how humble.
I didn't expect such interesting results. The site with the search term in its url was tops for av and yahoo, but not google. Yahoo ranked the wiki entry above google, but av reversed that decision, google of course thought itself was more important than the wiki. Google's own reference site was number one in its own search and near the top in the other two, but pagerank.net wasn't even in the top 10 for google's search. I'm not sure what conclusions can be drawn from all that, but it is definitely food for thought.
There are many tongues to talk, and but few heads to think. -Victor Hugo
ORDER BY adcost DESC
bite my glorious golden ass.
There's only two that really reflect the power of Pagerank: Click here.
About 1.2 billion pages, and surprise surprise, Acrobat Reader tops the list, followed by a who's who of internet applications and plugins. But around result #30 it gets a bit more interesting, and when you're a few dozen pages in, "new patterns begin to emerge."
And to explain why not to use "click here", I found this buried on page 45. Thanks for the proof pudding guys, it's delicious.
The algorithms behind PageRank are no secret. Why not just read about them from the source?
It seems that in searching "The Who", only that exact phrase is returned, but when searching the who, both words are searched, i.e. "the" appears as if it is being searched like a normal word here. If you try searching for the best, "the" is counted when used as part of the phrase "the best", but appears not to be counted when it appears by itself. The Google algorithm is apparently a lot more complicated than the usual explanations are.
I think we can get four or five tomorrow.
My grandmother used anecdotal evidence all the time, and she lived to be 120 years old.
I now have a nice basic understanding of Google page ranking system. Thats all I was asking for.
God spoke to me.
Great article.
The character of online content is changing now rapidly. We used to be in an Internet where mostly only the site provider determined the content on the pages they served (/. being a notable, early exception). Now, with the rise of "2.0" systems, user-generated content, and empowerment of the individual - the content being served on many sites is coming into sites from wide groups, and being moderated and curated by those groups.
So... a thought: as user-submitted and group-moderated content continues to rise on the Internet - the main premise behind PageRank system will change. To remain relevant, Google will need to continue to evolve how they do their rankings to match the structure of data in the online world. Will/Can they?
For a different, somewhat more technical, but more succint discussion, Cleve Moler [of Matlab fame] wrote another view of this topic, about 5 years ago.
The math is the same, of course, but two points of view may provide a greater sense of perspective. So to speak. And Cleve is always worth listening to.
What I found interesting about that link was the description listed for google's entry:
Where did they get that text from? It's not anywhere to be found in the source. Did they cheat? Or are they just tricky?
Pigeon Rank: http://www.google.com/technology/pigeonrank.html
They got it from the Google category at the Open Directory Project at dmoz.org, mirrored at directory.google.com. Google is a user of dmoz.org data but has completely de-emphasized that as of late.
It's actually against the dmoz license agreement to use their data without a link back to the source, but nobody seems to care.
Whenever possible, Google uses the DMOZ description for the snippet shown in the results.
SEO Firefox Extension
A system error caused the problem. It didn't insist it be 1/3 -- it was because choice B (which didn't correspond to any choice) was given as the correct answer. The website support has corrected the problem.
Frankly, any university with a CS program worth anything will have students take a linear algebra course in math as the first thing. It's a good weed-out-the-weak excercise early on, gets you up to speed with university level mathematics, and the stuff in itself comes in handy, for example in computer graphics. Being good at manipulating matrices has a lot of use in algorithmics too.
;-)
Please, try to impress me about Stanford some other way once you've progressed further
I want to play Free Market with a drowning Libertarian.
Things change with key word "search engine"... MSN is 1st! [ http://www.google.com/search?hl=en&lr=&q=search+en gine ]
The Answer Lies in The Genome
Blah. Ugly red clothes... Go Bears!
Why does that make PageRank broken? That's not the problem it tries to solve. Google might be broken for slavishly adhering to PageRank, but that's a different matter entirely...
Reality is the ultimate Rorschach.
shameless math plug(s) from my alma mater:
- cal berkeley leads stanford in william lowell putnam competition fellows
- as for killer math events
stanford had streleski (v.i.z. wikipedia)
but berkeley topped him with kaczynski (!)
seriously, best wishes for the cardinals shepherding
the putman team under prof. vakil last saturday.
??
Seems unfair that something Brin and Page developed together would bear only one of their names.
"Page-rank"
??
Porn has no quality (just cheese ;p) and is very popular!
I'd love to see pageranking's on those every month!
I've seen links on google searches that don't exist anymore but were ranked highly when they DID exist and still exist in the top 10 of the query. What happens to those? Do they stay at their ranking till they get overtaken by other more popular pages on the same search? Get their ranking slowly reduced because they don't exist?
Use the Google search, try for "best search engine". They don't list themselves. Then go try Yahoo search, try for "best search engine"....