By Thomas J. Leeper
James E. Monogan, III. 2015. Political Analysis Using R. Springer.
There are a lot of books about R. A partial list on The R Project’s website lists 157 as of May 2016 and that list has some glaring omissions such as Thomas Lumley’s Complex Surveys and Hadley Wickham’s (in press) R for Data Science. Jamie Monogan gives us a new addition to this long list in the form of Political Analysis Using R. Even in a crowded field Monogan’s text – hereafter PAUR – is a welcome addition and one that will fit nicely into a political science course on quantitative methods.
Contents
PAUR offers 11 chapters beginning with a novice, illustrated introduction to R and ending with a relatively high level discussion of R programming. Each chapter contains clearly highlighted example code, reproductions of R console output, numerous tables and figures, and a set of practice problems to test knowledge of content covered in the chapter.
Chapter 1 offers the very basics: how to download and install the software, and how to install add-on packages and use the R graphical user interfaces across Windows, Mac OS, and Linux. Chapter 2 covers data import (and export), along with basic manipulations, merging, and recoding.
Chapter 3 introduces R’s base graphics functionality, covering histograms, bar charts, boxplots, scatterplots, and line graphs. It then offers a quick overview of lattice graphics to implement these and other visualizations. A notable absence from the chapter (and one noted by the author) is the increasingly popular ggplot2 package. The choice to rely exclusively on base graphics walks a difficult line, favoring the underlying strengths and limitations of the core library over those of add-on packages. Instructors using PAUR but wishing to teach ggplot2 will have to look elsewhere for relevant coverage of this material, perhaps to Wickham’s ggplot2 book, now in its second edition. Chapter 4 covers familiar territory of descriptive statistics, including central tendency and dispersion. I appreciated the way PAUR covered these topics, presenting formulae, code, and graphical depictions of distributions close to one another.
As is a consistent theme throughout the text, PAUR presents practical R implementations of statistical problems as part of larger substantive discussion of real political science examples. Indeed, one of PAUR‘s key strengths for a political science audience is its reliance on a familiar set of datasets from real political science applications. Leveraging and role modelling good open science practices, PAUR provides a Dataverse with complete data and code for all examples, which are in turn drawn from publicly available data and code used in published research articles. This should make it extremely easy for instructors to use PAUR in a quantitative methods sequence, by closely linking formal coverage of techniques, the substantive application of those techniques in political science articles, and implementation of those techniques in R. PAUR means there is little excuse to continue to use iris to teach scatterplots or mtcars to teach linear regression.
Chapter 5 offers basic statistical hypothesis testing, as well as other techniques of bivariate association (e.g., cross-tabulation). This chapter uses the gmodels package to provide cross-tabulations, which is a somewhat unfortunate reminder of R’s weaknesses in basic cross-tabulation, but a good decision from the perspective of teaching tabulation to those new to statistics or coming to the language from Stata or SPSS. This chapter probably could have taken a different route and used R to teach the logic of statistical significance (e.g., through simulations), but instead focuses mainly on how to implement specific procedures (t-test, correlation coefficient, etc.).
Chapter 6 marks a rapid acceleration in the breadth and density of content offered by PAUR. While the first 5 chapters provide a first course in statistical analysis in R, the second half of the book quickly addresses a large number of approaches that may or may not fit in that setting. Chapter 6 covers OLS and Chapter 7 covers logit and probit models, ordinal outcome models, and count models. (By comparison, John Verzani’s (2005) Using R for Introductory Statistics ends with half of a chapter on logistic regression; John Fox and Sanford Weisberg’s (2011) An R Companion to Applied Regression covers GLMs over two chapters as a final advanced topic.)
This transition from an elementary textbook on statistics to a sophisticated introduction to the most commonly used methods in political science is a strength and challenge for PAUR. On the one hand it greatly expands the usefulness of the text beyond an undergraduate companion text to something that could reasonably fit in a masters-level or even PhD methods sequence. On the other, it means that some instructors may find it difficult to cover all of the topics in the text during a 15-week semester (and certainly not in a 10-week quarter). That said, the text covers many of the topics that were addressed in the “grab bag” 1st year methods course I took in graduate school and would have been an immensely helpful companion as I first trudged through linear algebra, maximum likelihood estimation, and time-series in R.
To highlight some of the content covered here, Chapter 6 addresses linear regression and does a good job of leveraging add-on packages to introduce model output (with xtable), model diagnostics (with lmtest and car, and heteroskedasticity (with sandwich). Chapter 7 turns to generalized linear models using examples from the Comparative Study of Electoral Systems.
Chapter 8 is a real gem. Here Monogan made the right choice to enlist an army of excellent packages to teach advanced topics not commonly covered in competing textbooks: lme4 to teach mixed effects or multi-level models, along with some of political scientists’ contributions to R in the form of MCMCpack to teach bayesian regression, cem to showcase matching methods, and wnominate to teach roll call analysis. These are topics and packages that would be unusual to see in other introductions to R or other statistical texts, which clearly shows Monogan’s intention in PAUR to provide a textbook for up-to-date political analysis.
Chapter 9 covers time series methods. I am always a bit ambivalent about teaching these in a general course, but the chapter presents the methods clearly so the key aspects are there for those who want to include them. Chapter 10 and 11 serve as a high-level capstone with coverage of matrix operations, basic programming (functions, loops, conditional expressions, etc.), optimization, and simulation. Again, as with everything in the latter third of the book, these elements make PAUR stand out among competitors as a text that is particularly appropriate for teaching methods of quantitative political science as it is currently practiced.
Overall Evaluation
PAUR is not a reference manual nor a book about R as a programming language. It is, as its title clearly states, a guidebook to practicing quantitative political science. It is the kind of text that will make it easier to teach postgraduate students how to use R, as well as provide a relevant companion text to an intermediate or advanced course in quantitative methods to be taught at other levels.
I suspect political scientists coming to R from Stata would also find the text particularly attractive given its coverage of nearly all statistical techniques in wide use in the discipline today and its reliance on familiar disciplinary examples. It rightly does not attempt to rival say Cameron and Trivedi’s (2010) Microeconometrics Using Stata in scope, but adopts a focus on more cutting edge techniques at the expense of minutiae about older methods.
I applaud Monogan for Political Analysis Using R, for the ambition to provide a broadly relevant and useful new text on R, and for showcasing the value added of data sharing and reproducible research as a model of learning and teaching quantitative research. And I only dock him a few points for leaving out ggplot2. Well done.