Google's PageRank Predicts Nobel Prize Winners
KentuckyFC writes "The pattern of citations between scientific papers forms a network that has remarkable similarities to the network formed by the web. So why not use Google's PageRank, the world's most effective search algorithm to rank these papers in the same way it ranks websites? That's exactly what a couple of US researchers have done for physics papers published by the American Physical Society since 1893 (abstract). The results make interesting reading because almost all of the top ten papers resulted in (or were linked to) Nobel Prizes for their authors. Which means that studying the up-and-coming entries on the list ought to be a good way of predicting future winners. Better get your bets in before the bookies get wind of this."
great minds think alike? Or does it just reflect what we like to see and how we like to think?
Preparing for an inundation of people citation bombing each other in 3 ... 2 ... 1 ...
>That's exactly what a couple of US researchers have done for physics papers published by the American Physical Society since 1893
I wasn't aware that Google's PageRank existed in 1893.
They take bets about this kind of thing?
Mon chien, il n'a pas du nez. Comment scent-il? TrÃs mauvais!
Did the star make the movie a hit, or did the movie make the star?
For 'prediction' to be valuable, it has to work with citations that were linked *before* the paper got the Nobel.
So even in this article Nicola Cabibbo demonstrated to deserve the Nobel Prize:
http://en.wikipedia.org/wiki/Nicola_Cabibbo
Let's see, so far, computer models have failed to accurately manage loan portfolios to higher risk buyers, failed to manage risk books for hedge funds, could not capture currency trading, can't predict the weather and are probably wrong about climate. Sure, let's have them predict nobel prize winners while we are at it!
This is my sig.
Seriously, like this is some kind of weird correlation. No shit Nobel prize winning papers would have excellent page ranks.
Yes, it happens all the time: the Swedish Academy can change their vote any time, if it feels pressed by the media.
Plain old sigh.
They assume that interest in someone's published work is the same whether they are Nobel prize winner or not. That is simply not true, papers written by Nobel prize winners will generate more links and have higher rating, just because they recently won the prize.
"Google predicts your next bowel movement"
Table-ized A.I.
I plan to do a blog post on this. I am seeing Google Meta-data being gold in more than just the ad revenue point of view. This data is showing up as useful predictors in medical research, and other fields.
Think Deeply.
If you're going to say "predict," you have to look at only the citations that were made *before* the Nobel Prize was given. Otherwise, you're just proving that a Nobel Prize is a fantastic way to market your research.
The algorithm for Google PageRank is based on the concept of citations from academia. If I remember correctly, the software was originally meant only to index academic papers and eventually grew to index the whole internet. So its not surprising that it predicts winners so well (depending on how much the Nobel committee weights citations in their decisions).
Why doesn't Slashdot ever get slashdotted?
Bowel movements would be pretty easy to predict tbh. You just get the Android app to track your bowel movements, it'll upload it to a google appliance gizmo that creates a trend.. maybe some input function to add in the primary sections of your diet (for instance, you ate something with alittle more fat or fiber.. etc..)
----- The internet has given everyone the ability to have their voice heard equally as loud.. even if they shouldn't be
That was my reaction as well. It only works if you base it on publications prior to them winning the Nobel Prize. Of course people are going to reference the papers after the Prize. Citing a Nobel winner gives a certain boost to credibility.
I can't believe that CS people who favor information retrieval can be so ignorant of the work preceding PageRank. Bibliometrics is a comparatively ancient art. This kind of analysis was done already - look at Harriet Zuckerman's 1977 book "Scientific elite: Nobel laureates in the United States"
The number of times that I've seen computer science researchers "invent" results seen before is astounding. PageRank is one algorithm, but many other citation measures have already been applied to the Nobel prediction game.
Note that they're not looking at webpage referrals, but citations in other scientific papers. Rather than simply counting citations, they're weighting the citations by the number of citations the citing papers received. Thus, if your paper is cited by a paper which is very popular, then your paper will get a boost to it's citation score.
Not having read the actual paper, the following question comes to mind: did they include only the period of time *before* the physicists got their Nobels? Because if they included the citations after that - yeah, I imagine those authors got quite a few citations being Nobel Prize winners and all...
That's true, unless this algorithm only searches through papers linked before the cooresponding announcement--which is what my first thought was on seeing the sumamry. I did not RTFA, though.
The meek may inherit the earth, but the strong shall take the stars.
See, what did I tell ya? Google lets their employees work a bit on odd experiments, and this is the kind of thing it may lead to. (Will Microsoft compete with Microsoft Bowel 2.0 ?)
Table-ized A.I.
Top 40 music singles chart predicts highest-selling singles of the week with astounding precision!
"Wise men talk because they have something to say; fools, because they have to say something" - Plato
The next step is obviously to let PageRank select the Nobel winners and cut out the middleman.
So novel and useful research is at the hub of a web of citations, and the novel and useful web pages are at the hub of a web of links ....
Puteulanus fenestra mortis
Anyone else really get tired of the friggin tags for a lot of these stories? CorrelationIsNotCausation (this meme here really needs to go, saying it dosn't make you sound smart when it makes no sense or is bleedingly obvious) , and BecauseItWillGetGamed? GTFO. How the hell do you as a scientist game the entire specter of academic publishing to get yourself voted as a nobel prize winner, without you know, maybe actually doing some good science (and having it further recognized by being cited heavily by peers)? The tags are next to useless unless they are good as flamebait (yes am aware of the irony)
And of course the results of their experiment are submitted in the form of a research paper. Hmm, I wonder...
The original paper doesn't really discuss the connections with Nobel prizes - it mentions as an aside that one paper was cited for a Nobel prize - as it's concerned not with predicted Nobel laureates but evaluating the importance of papers. Therefore any conclusions about predicting Nobel winners are without merit until further analysis is performed.
What idiot(s) tagged this article "correlationisnotcausation"? Obviously no-one implied causation, i.e. that Nobel prizes are awarded to people because they have high PageRanks. It talks about prediction and mentions betting, for both of which correlation is enough.
I know the mainstream media is often quick to jump on the "omg there is correlation, ergo there must be causation" bandwagon, but it obviously isn't the case here. Save such tags for when they're appropriate.
That's not completely true. You can use all citations to create a regression model (or structural equations model or whatever other statistical method you use) that is used to "predict" past and future prize winners. It's really hard to explain in this setting but it's basically using all the aggregate data to create your regression equation, then checking to see if the regression equation was a good fit to the data. From there you should hopefully be able to predict future winners with some degree of accuracy.
I'm not sure if the authors used a method like that or not - I skimmed the original article but don't have time to spend more time on it. In any case, it's not uncommon to use "post" data to help predict "pre" data. That's how you set up a model. Further, it's helpful to be able to use all the "post" data to help you know the size of the error of your prediction. I know I wasn't terribly clear but statistical modeling isn't as straightforward as it might seem.
It would be quite logical for the Nobelists to get considerably more exposure for the mere fact they on the prize. I would think merely referencing a paper from an author who'd made it up there would give your own research more attention than it would otherwise.
This would be quite obvious, but then again what is Google for anyway?
---
Have you read the Terms of Service lately?
I'm so glad we have the "CorrelationIsNotCausation" tag. I really thought that google was selecting the Nobel Prize winners all this time.
You mean people who write good papers get Nobel prizes? Wow!
Also, I didn't know that people who won Nobel prizes for fundamental discoveries won't post facto get gratuitous citations in the first line of the introduction of every subsequent paper in the field.
Page Rank captures whatever is `sensational', in every domain of human activity. Having RTFA, I conclude that if all that is sensational is good, then what we have here is an empirical demonstration of circular reasoning. If all that is good need not be sensational, we simply have misleading anecdotal evidence.
The foundation for the work of Messrs. Maslov and Redner was laid by Hari Seldon, who discovered that "while one cannot foresee the actions of a particular individual, the laws of statistics as applied to large groups of people could predict the general flow of future events." The recent paper by Messrs. Maslov and Redner represents the smallest corpus to which Seldon's theory has been successfully applied to date.
Further applications of these techniques to this same corpus will likely fall afoul of Seldon's second axiom: "the population should remain in ignorance of the results of the application of" the analysis.
"We reject as false the choice between our safety and our ideals." --The American President (20.1.2009)
There's a tool that tries to create a network of reviews, rather than just citations. In this case, the reviewer actually specifies the level of endorsement, whereas citations can mean anything. One of the most common reasons to cite a paper is to say "Our idea is way better than this lame idea", or "These guys did something similar, but it comparatively sucked". Sometimes the worst implementations get cited the most because they are so easy to improve upon. Why should that build up a paper?
For the last 8 years it's always went to the guy who hates Bush and/or America, while supporting GlobalWarming(TM).
It started on shakey ground, taking dynamite profits to appreciate things that "surely" will end war, and nowdays it's like everything else: monetarily defined, without needless morals to get in the way.
The Nobel Prize means nothing, if Al Gore gets one for charging people money to take carbon dioxide out of the air. (When he can't, and CO2 actually COOLS the planet, not warms it.)
Just another loss of another institution...see also Journalism.
Sorry, but links do not make a Nobel.
Excuse me, but please get off my Pennisetum Clandestinum, eh!
I read the blog entry - and all of the three pages of the article.
Firstly, the article on arXiv claims nothing about predictive power of any of these ranking algorithms on who will win a Noble prize.
Secondly, the blog author ("KFC") claims this "suggests an idea. Mining the later entries in this list might be an good way of predicting future prize winners".
Well, the issue here is that these ranks are always computed for "the current state", that is for "today". If these ranks were to have predictive power, they would have to be computed for the time up to the point in time when the Noble prize was awarded to the author.
Why is this "time" aspect important? Imagine the following (construed, extreme) scenario:
* 1950: Author X publishes article
* 1951: Article gets one citation
-- this is the point in time that matters for prediction!
* 1952: Author is awarded the Noble Prize
* 1953: Plenty of citations for the Noble prize winner
So, there you go. The blog author completely neglects the temporal aspect required for the ranks to have predictive (aka "future") value.
If I eat at Baja Surf, predict bowel movement within 5 minutes of leaving restaurant?
1. The use of words to express something different from and often opposite to their literal meaning.
2. An expression or utterance marked by a deliberate contrast between apparent and intended meaning.
3. A literary style employing such contrasts for humorous or rhetorical effect.
I wonder how different the result are from the normal cumulated Impact factor of the scientists publications....
But i forgot. Google is the only database on the planet....
Such an algorithm may be quite good at indicating popular papers and topics. But there are ideas which are like urban legends. They spread faster than they get falsified. Just think about topics like "cold fusion" or "transmutation of matter". An idea is not good just because it is attractive.
This is one more proof that the Nobel prize in physics this year was a total mistake. It should have been given to Cabibbo, who is rated first in the Pageranking. The work of Kobayashi and Masukawa is heavily based on his original work.
This is also a proof that there's correlation effect, not causation, going on here. Cabibbo didn't receive the Nobel prize, thus the high number of citations cannot be due to a subsequent fame.
If it's a valid predictor, it would produce those results based only on citations before the author receives a Nobel nomination. An author known to be a Nobel nominee, and especially a Nobel prize winner, will receive more citations and page reads based on their Nobel notoriety. An author who fails to cite a Nobel winning paper would be considered to have incomplete references, and the referees or thesis committee will tell them to add those missing citations.
Actually, citation ranking was first and developed some time in the 1970's. Google's page rank algorithm was an application of citation ranking to the web. The original Page Rank paper even cites the citation ranking papers.
(This also kinds of points out a problem with citation ranking: everybody these days is going to cite page rank, even though the idea originally was developed by other people. So, citation ranking isn't going to tell you who should get the credit, only who popularized an idea.)
This simply confirms that Cabibbo got screwed. Those who do particle physics have long known this and that's why he still gets first bill as the C in the "CKM matrix". This isn't to say that K and M didn't deserve the prize.
Google PageRank = Wisdom of Crowds
And Wisdom of Crowds != Wisdom of Intellectuals
I'd like to buy homeland for our 10 million people. http://twitter.com/mahadiga