The best way to learn new software is to experiment with data in
order to answer some interesting question. My experience with R has been cool.
I have also found the transition from Matlab to R (for economic applications)
quite easy. There is a ton of available guides on the web to help you get into
R. R’s graphics is pretty amazing too.
Now that I have R (and Rstudio as platform) I needed an
interesting question. Most students in economics come across binary response
variables at some point in their lives. But, a lot of students do not always
ask interesting questions and also do not explore the data correctly. So here I
will attempt to illustrate how one can use a simple regression to predict possible
Nobel Prize winners in economics.
You can find numerous Nobel prediction cites: see here.
The point about this blog entry is to explore some of R’s commands and then to fit (imperfectly) a model that predicts possible winners. To do this I had to gather a bit of data. The following data (and their sources) were collected:
- Previous winners (winners)
- Previous JB Clark winners (winners)
- The top 2000 ranked economists according to REPEC criteria (top 2000)
Of course this data is not sufficient to build a proper model –
you would need control for age (the youngest ever winner was Arrow aged 52),
you will need a variable indicating gender (there was only one female winner)
and possibly a variable indicating the importance of a specific paper in
shaping economic thought. But the data we have is not too bad. The data gives
us an idea about which universities improve your chances of winning, the field
of study and the importance of that field presently in helping understand
serious economic questions as well as relying on previous prizes in economics
to help predict a winner.
I now have a little model that has Nobel Prize as dependent variable (binary: 1 = won, 0 =
have not won yet). Our explanatory variables include university rank, author
rank, gender and JB Clark medallists). The university with the highest rank also
had the most Nobel Prize winners. A university with no previous winners will
receive a rank equal to zero. Author rank depends on the REPEC author ranking
list. I scaled the rank so that 0 means highest REPEC rank. I must admit that
my gender variable was constructed by my own understanding of gender names – it
would be a very time consuming process to Google 2000 ranked authors to
ascertain their gender – fortunately this variable is highly insignificant in
predicting winners. The JB Clark medal variable is also a binary variable where
1 = won and 0 = have not won.
It was quite a tedious task to correctly map all 2000 candidates
with the list of Nobel winners, their universities and JB Clark winners. Please
send me a request if you would like this data – it is currently sorted
according to REPEC ranking.
Now we are ready to use R. I use the ggplot2 package for nice
looking figures. The data looks as follows after loading it:
Table 1: The data
|
Name
|
Nobel
|
Bates
|
Gender
|
Uni
|
Rank
|
1
|
andreishleifer
|
0
|
1
|
1
|
0.07
|
0.0304
|
2
|
jamesjheckman
|
1
|
1
|
1
|
0.13
|
0.0345
|
3
|
robertjbarro
|
0
|
0
|
1
|
0.07
|
0.0487
|
4
|
josephestiglitz
|
1
|
1
|
1
|
0.07
|
0.0500
|
5
|
petercbphillips
|
0
|
0
|
1
|
0.04
|
0.0746
|
6
|
daronacemoglu
|
0
|
1
|
1
|
0.06
|
0.0822
|
7
|
robertelucasjr
|
1
|
0
|
1
|
0.13
|
0.0944
|
Andrei Shleifer is currently the highest ranked on REPEC. The
other columns are the explanatory variables. I would first like to summarise
the data before I estimate the little regression. About 63% of JB Clark medallists
have gone on to winning a Nobel Prize. Some of the JB Clark medallists are
still too “young” to win a Nobel Prize. There are approximately 10% of female
candidates in the list of REPEC author rankings.
Next I estimate the Probit. All the slope coefficients, except for
gender, are significant:
Table 2: Regression output
Estimate Std. Error z value
Pr(>|z|)
(Intercept)
-2.12882 0.39778 -5.352 8.71e-08 ***
Bates 1.21190 0.25903
4.679 2.89e-06 ***
Gender 0.41422 0.37742
1.098 0.272
Uni 7.04045 1.59790
4.406 1.05e-05 ***
Rank -0.06406 0.01549
-4.136 3.53e-05 ***
Remember that an increasing number in rank means that you are ranked lower on REPEC. The results make sense to me: The probability of winning a Nobel Prize increases if you were a previous JB Clark medallist, if you are a male (gender is not statistically significant so little weight should be assigned to being male or female), if you attended or are lecturing at a university that hosted previous winners and a higher rank on REPEC (in this case 0 is the highest rank).
There is no direct interpretation of the coefficients. One way to evaluate the coefficients is by means of predictive curves. The Figure below shows the probability of winning. The y-axis is the actual probability while the x-axis varies the REPEC rank. The different shades of blue control for the university rank. The figure shows that the probability of winning a Nobel Prize is over 0.6 with a very good publication record and having attended a university that hosted Nobel laureates.
Figure 1: Prediction curves
We can use this model to predict likely winners. The results will
also contain past winners as well as people who have passed away (obviously
they cannot win) – we can use this to evaluate the fit of the model. The table
is sorted according to the probability of winning. The results, while
interesting, are not that reliable. There were a number of academics who have
won, but that score a fairly low probability of winning.
The results suggest that Kevin Murphy has a very high chance of
winning sometime in the future:
Table 3: Likely winners
REPEC Name Nobel Bates Gender Uni Rank Probab
2 jamesjheckman 1 1 1 0.13 0.0345 0.6592267
13 garysbecker 1 1 1 0.13 0.1917 0.6555262
142 kevinmmurphy 0 1 1 0.13 1.3531 0.6277450
162 kennethjarrow 1 1 1 0.13 1.5665 0.6225621
213 stevenlevitt 0 1 1 0.13 2.1345 0.6086614
256 daniellmcfadden 1 1 1 0.13 2.5273 0.5989653
306 davidmkreps 0 1 1 0.13 3.0071 0.5870391
449 amichaelspence 1 1 1 0.13 4.3881 0.5522986
20 paulrkrugman 1 1 1 0.08 0.2650 0.5173747
1 andreishleifer 0 1 1 0.07 0.0304 0.4952882
4 josephestiglitz 1 1 1 0.07 0.0500 0.4947874
27 lawrencehsummers 0 1 1 0.07 0.2978 0.4884562
6 daronacemoglu 0 1 1 0.06 0.0822 0.4659186
805 jonathanlevin 0 1 1 0.13 7.9372 0.4618090
67 jerryahausman 0 1 1 0.06 0.7140 0.4498638
497 robertmsolow 1 1 1 0.06 4.9049 0.3466185
667 rajchetty 0 1 1 0.07 6.4323 0.3365476
12 martinsfeldstein 0 1 1 0.00 0.1817 0.3035090
40 davidecard 0 1 1 0.00 0.3929 0.2987970
There are many interesting questions that one could ask with available data. In this case I was curious to get an idea of who the likely Nobel Prize winners will be. Unfortunately the Bates medal has the largest weight in predicting winners in this model. This is definitely not a realistic model. But it illustrates some interesting concepts – if you want to win a Nobel Prize make sure to go to a university that where many Nobel laureates made a name, make sure to have a good publication record and try to win a JB Clark medal.