By Barry C. Burden, David T. Canon, Kenneth R. Mayer, and Donald P. Moynihan

MacKinnon and Webb offer a useful analysis of how the uncertainty of causal effects can be underestimated when observations are clustered and the treatment is applied to a very large or vary small share of the clusters. Their mathematical exposition, simulation exercises, and replication analysis provide a helpful guide for how to proceed when data are poorly behaved in this way. These are valuable lessons for researchers studying impacts of policy in observational data where policies tend to be sluggish and thus do not generate much variability in the key explanatory variables.

###
**Correction of Two Errors**

MacKinnon and Webb find two errors in our analysis, while nonetheless concluding “we do not regard these findings as challenging the conclusions of Burden et al. (2017).” Although we are embarrassed by the mistakes, we are also grateful for their discovery.[1] Our commitment to transparency is reflected by the fact the data was been made public for replication purposes since well before the article was published. We have posted corrected versions of the replication files and published a corrigendum with the journal where the article was original published.

Fortunately, none of the other analyses in our article were affected. It is only Table 7 where errors affect the analysis. Tables 2 through 6 remain intact.

We concede that when corrections are made the effect of early voting drops from statistical significance in the model of the difference in the Democratic vote between 2008 and 2012. All of the various standard errors they report are far too large to reject the null hypothesis.

###
**The Problem of Limited Variation**

The episode highlights the tradeoffs that researchers face between applying what appears to be a theoretically superior estimation technique (i.e., difference-in-difference) and the practical constraints of a particular application (i.e., limited variation in treatment variables) that make its use intractable. In the case of our analysis, election laws do not change rapidly, and the conclusions of our analysis were largely based on cross-sectional analyses (Tables 2-6), with the difference-in-difference largely offered as a supplemental analysis.

We are in agreement with MacKinnon and Webb that models designed to estimate causal effects (or even simple relationships) may be quite tenuous when the number of clusters is small and the clusters are treated in a highly unbalanced fashion. In fact, we explained our reluctance to apply the difference-in-difference model to our data because of the limited leverage available. We were explicit about our reservations in this regard. As our article stated:

“A limitation of the difference-in-difference approach in our application is that few states actually changed their election laws between elections. As Table A1 (see Supplemental Material) shows, for some combinations of laws there are no changes at all. For others, the number of states changing is as low as one or two. As result, we cannot include some of the variables in the model because they do not change. For some other variables, the interpretation of the coefficients would be ambiguous given the small number of states involved; the dummy variables essentially become fixed effects for one or two states” (p. 572).

This is unfortunate in our application because the difference-in-difference models are likely to be viewed as more convincing than the cross-sectional models. This is why we offered theory suggesting that the more robust cross-sectional results were not likely to suffer from endogeneity.

The null result in the difference-in-difference models is not especially surprising given our warning above about the limited leverage provided by the dataset. Indeed, the same variable was insignificant in our model of the Democratic vote between 2004 and 2008 that we also reported in Table 7. We are left to conclude that the data are not amenable to detecting effects using difference-in-difference models. Perhaps researchers will collect data from more elections to provide more variation in the key variable and estimate parameters more efficiently.

In addition to simply replicating our analysis, MacKinnon and Webb also conduct an extension to explore asymmetric effects. They separate the treated states into those where early voting was adopted and where early voting was repealed. We agree that researchers ought to investigate such asymmetries. We recommended as much in our article: “As early voting is being rolled back in some states, future research should explore the potential asymmetry between the expansion and contraction of election practices” (p. 573). However, we think this is not feasible with existing data. As MacKinnon and Webb note, only two states adopted early voting and only one state repealed early voting. As a result, analyzing these cases separately as they do essentially renders the treatment variables to be little more than fixed effects for one or two states, as we warned in our article. The coefficients might be statistically significant using various standard error calculations, but it is not clear that MacKinnon and Webb are actually estimating the treatment effects rather than something idiosyncratic about one or two states.

###
**Conclusion**

While the errors made in our difference-in-difference analysis were regrettable, we think the greater lesson from the skilled analysis of MacKinnon and Webb is to raise further doubt about whether this tool is simply unsuitable in such a policy setting. While all else is equal, it may offer a superior mode of analysis; but all else is not equal. Researchers need to find the best mode of analysis to fit with the limitations of the data.

###
**Footnotes**

- The mistake in coding Alaska is inconsequential because, as MacKinnon and Webb note, observations from Alaska and Hawaii are dropped from the multivariate analysis.