Steps to a new world

Steps to a new world

Monday, 7 April 2014

Can we predict the next winner of the Nobel Prize in economics?

The best way to learn new software is to experiment with data in order to answer some interesting question. My experience with R has been cool. I have also found the transition from Matlab to R (for economic applications) quite easy. There is a ton of available guides on the web to help you get into R. R’s graphics is pretty amazing too.

Now that I have R (and Rstudio as platform) I needed an interesting question. Most students in economics come across binary response variables at some point in their lives. But, a lot of students do not always ask interesting questions and also do not explore the data correctly. So here I will attempt to illustrate how one can use a simple regression to predict possible Nobel Prize winners in economics.

You can find numerous Nobel prediction cites: see here.

The point about this blog entry is to explore some of R’s commands and then to fit (imperfectly) a model that predicts possible winners. To do this I had to gather a bit of data. The following data (and their sources) were collected:
  • Previous winners (winners)
  • Previous JB Clark winners (winners)
  • The top 2000 ranked economists according to REPEC criteria (top 2000)
Of course this data is not sufficient to build a proper model – you would need control for age (the youngest ever winner was Arrow aged 52), you will need a variable indicating gender (there was only one female winner) and possibly a variable indicating the importance of a specific paper in shaping economic thought. But the data we have is not too bad. The data gives us an idea about which universities improve your chances of winning, the field of study and the importance of that field presently in helping understand serious economic questions as well as relying on previous prizes in economics to help predict a winner.

I now have a little model that has Nobel Prize  as dependent variable (binary: 1 = won, 0 = have not won yet). Our explanatory variables include university rank, author rank, gender and JB Clark medallists). The university with the highest rank also had the most Nobel Prize winners. A university with no previous winners will receive a rank equal to zero. Author rank depends on the REPEC author ranking list. I scaled the rank so that 0 means highest REPEC rank. I must admit that my gender variable was constructed by my own understanding of gender names – it would be a very time consuming process to Google 2000 ranked authors to ascertain their gender – fortunately this variable is highly insignificant in predicting winners. The JB Clark medal variable is also a binary variable where 1 = won and 0 = have not won.

It was quite a tedious task to correctly map all 2000 candidates with the list of Nobel winners, their universities and JB Clark winners. Please send me a request if you would like this data – it is currently sorted according to REPEC ranking.

Now we are ready to use R. I use the ggplot2 package for nice looking figures. The data looks as follows after loading it:

Table 1: The data

Name
Nobel
Bates
Gender
Uni
Rank
1
andreishleifer
0
1
1
0.07
0.0304
2
jamesjheckman
1
1
1
0.13
0.0345
3
robertjbarro
0
0
1
0.07
0.0487
4
josephestiglitz
1
1
1
0.07
0.0500
5
petercbphillips
0
0
1
0.04
0.0746
6
daronacemoglu
0
1
1
0.06
0.0822
7
robertelucasjr
1
0
1
0.13
0.0944

Andrei Shleifer is currently the highest ranked on REPEC. The other columns are the explanatory variables. I would first like to summarise the data before I estimate the little regression. About 63% of JB Clark medallists have gone on to winning a Nobel Prize. Some of the JB Clark medallists are still too “young” to win a Nobel Prize. There are approximately 10% of female candidates in the list of REPEC author rankings.
Next I estimate the Probit. All the slope coefficients, except for gender, are significant:

Table 2: Regression output
            Estimate Std. Error z value Pr(>|z|)   
(Intercept) -2.12882    0.39778  -5.352 8.71e-08 ***
Bates        1.21190    0.25903   4.679 2.89e-06 ***
Gender       0.41422    0.37742   1.098    0.272   
Uni          7.04045    1.59790   4.406 1.05e-05 ***
Rank        -0.06406    0.01549  -4.136 3.53e-05 ***


Remember that an increasing number in rank means that you are ranked lower on REPEC. The results make sense to me: The probability of winning a Nobel Prize increases if you were a previous JB Clark medallist, if you are a male (gender is not statistically significant so little weight should be assigned to being male or female), if you attended or are lecturing at a university that hosted previous winners and a higher rank on REPEC (in this case 0 is the highest rank).

There is no direct interpretation of the coefficients. One way to evaluate the coefficients is by means of predictive curves. The Figure below shows the probability of winning. The y-axis is the actual probability while the x-axis varies the REPEC rank. The different shades of blue control for the university rank. The figure shows that the probability of winning a Nobel Prize is over 0.6 with a very good publication record and having attended a university that hosted Nobel laureates.

Figure 1: Prediction curves
We can use this model to predict likely winners. The results will also contain past winners as well as people who have passed away (obviously they cannot win) – we can use this to evaluate the fit of the model. The table is sorted according to the probability of winning. The results, while interesting, are not that reliable. There were a number of academics who have won, but that score a fairly low probability of winning.
The results suggest that Kevin Murphy has a very high chance of winning sometime in the future:

Table 3: Likely winners
REPEC           Name Nobel Bates Gender  Uni   Rank    Probab 
2      jamesjheckman     1     1      1 0.13 0.0345 0.6592267
13       garysbecker     1     1      1 0.13 0.1917 0.6555262
142     kevinmmurphy     0     1      1 0.13 1.3531 0.6277450
162    kennethjarrow     1     1      1 0.13 1.5665 0.6225621
213     stevenlevitt     0     1      1 0.13 2.1345 0.6086614
256  daniellmcfadden     1     1      1 0.13 2.5273 0.5989653
306      davidmkreps     0     1      1 0.13 3.0071 0.5870391
449   amichaelspence     1     1      1 0.13 4.3881 0.5522986
20      paulrkrugman     1     1      1 0.08 0.2650 0.5173747
1     andreishleifer     0     1      1 0.07 0.0304 0.4952882
4    josephestiglitz     1     1      1 0.07 0.0500 0.4947874
27  lawrencehsummers     0     1      1 0.07 0.2978 0.4884562
6      daronacemoglu     0     1      1 0.06 0.0822 0.4659186
805    jonathanlevin     0     1      1 0.13 7.9372 0.4618090
67     jerryahausman     0     1      1 0.06 0.7140 0.4498638
497     robertmsolow     1     1      1 0.06 4.9049 0.3466185
667        rajchetty     0     1      1 0.07 6.4323 0.3365476
12  martinsfeldstein     0     1      1 0.00 0.1817 0.3035090
40        davidecard     0     1      1 0.00 0.3929 0.2987970
 
There are many interesting questions that one could ask with available data. In this case I was curious to get an idea of who the likely Nobel Prize winners will be. Unfortunately the Bates medal has the largest weight in predicting winners in this model. This is definitely not a realistic model. But it illustrates some interesting concepts – if you want to win a Nobel Prize make sure to go to a university that where many Nobel laureates made a name, make sure to have a good publication record and try to win a JB Clark medal.