Being Careful with Multilevel Regression with Poststratification

October 14, 2013

By Justin Esarey

Over at Andrew Gelman’s blog, there’s an interesting discussion going on about a new paper by Buttice and Highton assessing the accuracy of MRP estimates of state-level opinion. (MRP is short for Multilevel Regression with Poststratification, sometimes called “Mister P.”) MRP is used to recover state-level estimates of public opinion that are more accurate than simply using the means of the state samples. This is especially true when, as is usually the case, the number of observations in any state is small.

I highly recommend that you take a look at the Buttice and Highton paper first, which is worth a read if you’re thinking about using MRP in an analysis. My own take is that MRP is a very useful tool, and that the Buttice and Highton paper is a caution against using MRP indiscriminately without carefully assessing whether the necessary conditions are met. As Jeff Lax and Justin Phillips say:

But one cannot blindly run MRP and expect it to work well. Users must take the time to make sure they have a reasonable model for predicting opinion. Indeed, one way to read the BH piece is that if you randomly choose a survey question from those CCES surveys and throw just any state-level predictor at it (or maybe worse, no state-level predictor), the MRP estimates that result will not be as good as those you have seen used in the substantive literature invoking MRP. Indeed, they point out that only one published MRP paper (Pacheco) fails to follow their recommendation to use a state-level predictor.

Jeff and Justin also note the existence of a new software package to perform MRP:

MRP, the package. Use the new MRP package, available using the installation instructions below and to be available more easily soon. For now, use versions of the blme and lme4 packages that predate versions 1.x. Using the devtools package, the following commands will install the latest versions of mrpdata and mrp:

library(devtools)
install_github(“mrp”, “malecki”, sub=”mrpdata”)
install_github(“mrp”, “malecki”, sub=”mrp”)

Buttice and Highton have a response to the comments of Lax+Phillips, and some additional criticisms from Yair Ghitza:

Suppose our analyses and results are deeply flawed and deserve to be disregarded completely, a supposition that we recognize may reflect the views of some readers. Consider the question that motivated our article: How confident should a researcher who only has a single national survey sample of 1,500 (or even 3,000) respondents be in the MRP estimates of state opinion produced with it? Setting aside our article, the only other published studies that assess MRP performance with samples like these are Lax and Phillips (2009) and Warshaw and Rodden (2012). The former assess MRP performance for two opinions and the latter do it for six. And, Warshaw and Rodden (2012) do find what we would call nontrivial variation in average MRP performance across items (look at the MRP entries for the six opinion items when N=2,500 in their Figure 5; we highlighted the relevant entries.) On the basis of Lax and Phillips (2009) and Warshaw and Rodden (2012), then, we would not draw the inference that MRP will consistently and routinely perform well across different opinions or even for the same opinion at different points in time. And, even when MRP has worked well, we are unsure how the researcher can verify its performance. The investigations of MRP performance that preceded ours – and ours, too – all assess the quality of the estimates by comparing them to “true” values. In the absence of knowing the true values we do not see how the researcher could determine how “good” or “bad” the MRP estimates are, and we would therefore hesitate to use them. That said, developing a validation technique for MRP estimates when “true” values are unknown appears to be an issue that LP are working on, and we look forward to reading the next version of their MPSA paper.

Check it out!