The Math Behind PageRank
anaesthetica writes "The American Mathematical Society is featuring an article with an in-depth explanation of the type of mathematical operations that power PageRank. Because about 95% of the text on the 25 billion pages indexed by Google consist of the same 10,000 words, determining relevance requires an extremely sophisticated set of methods. And because the links constituting the web are constantly changing and updating, the relevance of pages needs to be recalculated on a continuous basis."
But 9,000 of those words are slang for parts of the human anatomy. Go figure.
I have sites with a PR of 6, and I can tell you that they got that way because of inbound links from other sites. In fact, when other sites dropped those links, my PR dropped (to 5, and even to 4). Getting more inbound links brought the PR back.
Think about those links, too. How often do you use common words in an HREF? I don't think there's a lot of weeding out of common words since the link to a site is usually either its name, or a description containing some important keywords.
I love seeing these technoscientists think they understand PageRank, but just like TimeCube, they're way, way off.
Whatever google's doing with PageRank, it seems to be doing it right. At least from my experience.
The article specifically says the PageRank eigenvector is only recalculated once a month, approximately. Even though Google uses some clever numerics to calculate the eigenvectors to a 25 billion by 25 billion matrix by iteration, it still takes several hours to finish.
It seems like it would be the nouns, pronouns, etc. that Google should be paying attention to. Who cares about all the verbs, adjectives, etc. that just muddy the indexing waters?
I read about this some time ago ... I think the paper was entitled "The 10 billion dollar Eignvector: The math behind google" or something to that effect. Sorry, but I've got a new laptop and cannot find the exact title. It was an excellent introduction for beginner computational scientists for an application of the eigenvector. I forget the American University responsible.
.
I skimmed the article and didn't find what I wanted to find. If you make a webpage that you want ranked high, what do you do? Do you make 100 geocities accounts and provide links to your main website, or what? I'm just wondering this out of curiosity, not out of need.
God spoke to me.
I didn't think pigeons had much mathematical ability. Or does this mean they've abandoned the biological approach?
As a self proclaimed SEO expert - I honestly don't believe PageRank counts nearly as much as it did a few years ago! You'll find lots of PR5 sites ahead in the SERPS of PR9 sites!
LINUX ONLINE POKER: Linux Poker
As I was reading the article summary I thought it was going to say 95% of the 25 billion pages indexed by Google consist of spam blogs.
I asked some math website to put a link to http://www.mathpotd.org/ Math Problem of the Day -- they don't bother to do so. They know the math and use it.
Because about 95% of the text on the 25 billion pages indexed by Google consist of the same 10,000 words, determining relevance requires an extremely sophisticated set of methods.
They use a set of nested if-else statements
*ducks*
SELECT advertiser, description, link, adcost
FROM tblAdvertisers
WHERE adword LIKE %searchstring%
ORDER BY adcost
1971-2006
From Crave to Grave.
Assuming we're using the standard meanings of "random", "together" and "row", there are exactly 3 combinations (note: there's no need to distinguish between the sophomores as individuals).
1. J S S = sophomores together
2. S J S = not together
3. S S J = sophomores together
Thus, the correct answer is 2/3. But for some reason, the site insists that the answer is 1/3.
Sure is no string theory. For some strange reason most useful mathematics applied to computers is rather simple; compare that with the sophisticated mathematical tools used in theoretical physics!.
The fact that the mathematics involved is simple is irrelevant, the only thing that matters is the practical usefulness.
ORDER BY adcost DESC
bite my glorious golden ass.
I don't have any Mod points right now, but isn't a reply to an Offtopic post pretty much automatically offtopic? Go ahead and mod me Offtopic, I'll consider that an affirmative answer.
...the future crusty old bastards are already drinking the Kool-Aid.
There's only two that really reflect the power of Pagerank: Click here.
About 1.2 billion pages, and surprise surprise, Acrobat Reader tops the list, followed by a who's who of internet applications and plugins. But around result #30 it gets a bit more interesting, and when you're a few dozen pages in, "new patterns begin to emerge."
And to explain why not to use "click here", I found this buried on page 45. Thanks for the proof pudding guys, it's delicious.
With a name like markov_chain I thought you were going to lay the math-smackdown on the GP. After all, page rank is basically a Markov chain where the graph it is acting on is the internet.
The algorithms behind PageRank are no secret. Why not just read about them from the source?
http://www.google.com/search?hl=en&q=coupons
Take a look at the top 20 results. You'll notice that 5 of these 20 results are from the same guys.
It seems that in searching "The Who", only that exact phrase is returned, but when searching the who, both words are searched, i.e. "the" appears as if it is being searched like a normal word here. If you try searching for the best, "the" is counted when used as part of the phrase "the best", but appears not to be counted when it appears by itself. The Google algorithm is apparently a lot more complicated than the usual explanations are.
I think we can get four or five tomorrow.
My grandmother used anecdotal evidence all the time, and she lived to be 120 years old.
I now have a nice basic understanding of Google page ranking system. Thats all I was asking for.
God spoke to me.
Great article.
The character of online content is changing now rapidly. We used to be in an Internet where mostly only the site provider determined the content on the pages they served (/. being a notable, early exception). Now, with the rise of "2.0" systems, user-generated content, and empowerment of the individual - the content being served on many sites is coming into sites from wide groups, and being moderated and curated by those groups.
So... a thought: as user-submitted and group-moderated content continues to rise on the Internet - the main premise behind PageRank system will change. To remain relevant, Google will need to continue to evolve how they do their rankings to match the structure of data in the online world. Will/Can they?
For a different, somewhat more technical, but more succint discussion, Cleve Moler [of Matlab fame] wrote another view of this topic, about 5 years ago.
The math is the same, of course, but two points of view may provide a greater sense of perspective. So to speak. And Cleve is always worth listening to.
Pigeon Rank: http://www.google.com/technology/pigeonrank.html
Just thought I'd add this shameless plug here for my uni...I'm currently taking an undergraduate course in linear algebra at Stanford (Math 51; it's taken by a lot of freshmen) and we studied almost everything the article talked about earlier in the quarter. So, moral of the story - if you want to learn interesting stuff, come to Stanford!
I thought some people might be interested...
neat search! page 28, at the bottom, right in a row, state of texas, gop.com, then realplayer
there's some funny stuff in there! What humans really think is worth clicking here for and close linkages in group mindset
Pagerank is not a number between 1 and 10. There are lots of other good citations to back me up but I'm too lazy to find them right now. And too lazy to log in.
A system error caused the problem. It didn't insist it be 1/3 -- it was because choice B (which didn't correspond to any choice) was given as the correct answer. The website support has corrected the problem.
part of the original text is
8. Appendix A: Advertising and Mixed Motives
They discuss the motives of search engines and the inability of a for profit search engine to correctly answer a search if it is in direct competition with its advertisers
Popularity != Quality. Thats the way I see pagerank and other search response methods.
I think pagerank gives relatively bad results because of a flawed notion that something
that is popular is authoritative for determining what you are searching for.
Unfortunately computers are not very good at comprehending the data they read
and understanding in context.... yet.
I think, all we have so far in search engine ranking is a bunch of
"near enough is good enough tricks".
I hope someone in a garage somewhere finds a better trick, I am getting bored of
the way search engines search.
Did anyone else read that as "The MYTH behind Pagerank"
??
Seems unfair that something Brin and Page developed together would bear only one of their names.
"Page-rank"
??
And they predictably bitch-slapped this thread.
???
:/
6 (almost 7) hours later and this post wasn't moded freaking hilarious??
I apologize for that sir, you're +500 funny by me.
And yes I'm posting AC too..
And yes I just got on reading this..
And no I don't have any mod friggin points
I've seen links on google searches that don't exist anymore but were ranked highly when they DID exist and still exist in the top 10 of the query. What happens to those? Do they stay at their ranking till they get overtaken by other more popular pages on the same search? Get their ranking slowly reduced because they don't exist?