A Decade of Replications: Lessons from the Quarterly Journal of Political Science

By Nicholas Eubank

Editor’s note: this piece is contributed by Nicholas Eubank, a PhD Candidate in Political Economy at the Stanford University Graduate School of Business.

The success of science depends critically on the ability of peers to interrogate published research in an effort not only to confirm its validity but also to extend its scope and probe its limitations. Yet as social science has become increasingly dependent on computational analyses, traditional means of ensuring the accessibility of research — like peer review of written academic publications — are no longer sufficient. To truly ensure the integrity of academic research moving forward, it is necessary that published papers be accompanied by the code used to generate results. This will allow other researchers to investigate not just whether a paper’s methods are theoretically sound, but also whether they have been properly implemented and are robust to alternative specifications.

Since its inception in 2005, the Quarterly Journal of Political Science (QJPS) has sought to encourage this type of transparency by requiring all submissions to be accompanied by a replication package, consisting of data and code for generating paper results. These packages are then made available with the paper on the QJPS website. In addition, all replication packages are subject to internal review by the QJPS prior to publication. This internal review includes ensuring the code executes smoothly, results from the paper can be easily located, and results generated by the replication package match those in the paper.

This policy is motivated by the belief that publication of replication materials serves at least three important academic purposes. First, it helps directly ensure the integrity of results published in the QJPS. Although the in-house screening process constitutes a minimum bar for replication, it has nevertheless identified a remarkable number of problems in papers. In the last two years, for example, 13 of the 24 empirical papers subject to in-house review were found to have discrepancies between the results generated by authors’ own code and the results in their written manuscripts.

Second, by emphasizing the need for transparent and easy-to-interpret code, the QJPS hopes to lower the costs associated with other scholars interrogating the results of existing papers. This increases the probability other scholars will examine the code for published papers, potentially identifying errors or issues of robustness if they exist. In addition, while not all code is likely to be examined in detail, it is the hope of the QJPS that this transparency will motivate submitting authors to be especially cautious in their coding and robustness checks, preventing errors before they occur.

Third and finally, publication of transparent replication packages helps facilitate research that builds on past work. Many papers published in the QJPS represent methodological innovations, and by making the code underlying those innovations publicly accessible, we hope to lower the cost to future researchers of building on existing work.

(1) In-House Replication

The experience of the QJPS in its first decade underscores the importance of its policy of in-house review. Prior to publication, all replication packages are tested to ensure code runs cleanly, is interpretable, and generates the results in the paper.

This level of review represents a sensible compromise between the two extremes of review. On the one hand, most people would agree that an ideal replication would consist of a talented researcher re-creating a paper from scratch based solely on the paper’s written methodology section. However, undertaking such replications for every submitted paper would be cost-prohibitive in time and labor, as would having someone check an author’s code for errors line-by-line. On the other hand, direct publication of replication packages without review is also potentially problematic. Experience has shown that many authors submit replication packages that are extremely difficult to interpret or may not even run, defeating the purpose of a replication policy.

Given that the QJPS review is relatively basic, however, one might ask whether it is even worth the considerable time the QJPS invests. Experience has shown the answer is an unambiguous “yes.” Of the 24 empirical papers subject to in-house replication review since September 2012, [1] only 4 packages required no modifications. Of the remaining 20 papers, 13 had code that would not execute without errors, 8 failed to include code for results that appeared in the paper, [2] and 7 failed to include installation directions for software dependencies. Most troubling, however, 13 (54 percent) 14 (58 percent) had results in the paper that differed from those generated by the author’s own code. Some of these issues were relatively small — likely arising from rounding errors during transcription — but in other cases they involved incorrectly signed or mis-labeled regression coefficients, large errors in observation counts, and incorrect summary statistics. Frequently, these discrepancies required changes to full columns or tables of results. Moreover, Zachary Peskowitz, who served as the QJPS replication assistant from 2010 to 2012, reports similar levels of replication errors during his tenure as well. The extent of the issues — which occurred despite authors having been informed their packages would be subject to review — points to the necessity of this type of in-house interrogation of code prior to paper publication.

(2) Additional Considerations for a Replication Policy

This section presents an overview of some of the most pressing and concrete considerations the QJPS has come to view as central to a successful replication policy. These considerations — and the specific policies adopted to address them — are the result of hard-learned lessons from a decade of replication experience.

2.1 Ease of Replication

The primary goal of QJPS policies is ensuring replication materials can be used and interpreted with the greatest of ease. To the QJPS, ease of replication means anyone who is interested in replicating a published article (hereafter, a “replicator”) should be able to do so as follows:

Open a README.txt file in the root replication folder, and find a summary of all replication materials in that folder, including subfolders if any.
After installing any required software (see Section 2.4 on Software Dependencies) and setting a working directory according to directions provided in the README.txt file, the replicator should be able simply to open and run the relevant files to generate every result and figure in the publication. This includes all results in print and/or online appendices.
Once the code has finished running, the replicator should be able easily to locate the output and to see where that output is reported in the paper’s text, footnotes, figures, tables, or appendices.

2.2 README.txt File

To facilitate ease of replication, all replication packages should include a README.txt file that includes, at a minimum:

Table of Contents: a brief description of every file in the replication folder.
Notes for Each Table and Figure: a short list of where replicators will find the code needed to replicate all parts of the publication.
Base Software Dependencies: a list of all software required for replication, including the version of software used by the author (e.g. Stata 11.1, R 2.15.3, 32bit Windows 7, OSX 10.9.4).
Additional Dependencies: a list of all libraries or added functions required for replication, as well as the versions of the libraries and functions that were used and the location from which those libraries and functions were obtained.
1. R: the current R versions can be found by typing R.Version() and information on loaded libraries can be found by typing sessionInfo().
2. Stata: Stata does not specifically “load” extra-functions in each session, but a list of all add-ons installed on a system can be found by typing ado.dir.
Seed locations: Authors are required to set seeds in their code for any analyses that employ randomness (e.g., simulations or bootstrapped standard errors. For further discussion, see Section 2.5). The README.txt file should include a list of locations where seeds are set in the analyses so that replicators can find and change the seeds to check the robustness of the results.

2.3 Depth of Replication

The QJPS requires that every replication package include the code that computes the primary results of the paper. In other words, it is not sufficient to provide a file of pre-computed results along with the code that formats the results for LaTeX. Rather, the replication package must include everything that is needed to execute the statistical analyses or simulations that constitute the primary contribution of the paper. For example, if a paper’s primary contribution is a set of regressions, then the data and code needed to produce those regressions must be included. If a paper’s primary contribution is a simulation, then code for that simulation must be provided—not just a dataset of the simulation results. If a paper’s primary contribution is a novel estimator, then code for the estimator must be provided. And, if a paper’s primary contribution is theoretical and numeric simulation or approximation methods were used to provide the equilibrium characterization, then that code must be included.

Although the QJPS does not necessarily require the submitted code to access the data if the data are publicly available (e.g., data from the National Election Studies, or some other data repository), it does require that the dataset containing all of the original variables used in the analysis be included in the replication package. For the sake of transparency, the variables should be in their original, untransformed and unrecoded form, with code included that performs the transformations and recodings in the reported analyses. This allows replicators to assess the impact of transformations and recodings on the results.

2.3.1 Proprietary and Non-Public Data

If an analysis relies on proprietary or non-public data, authors are required to contact the QJPS Editors before or during initial submission. Even when data cannot be released publicly, authors are often required to provide QJPS staff access to data for replication prior to publication. Although this sometimes requires additional arrangements — in the past, it has been necessary for QJPS staff to be written in IRB authorizations — in-house review is especially important in these contexts, as papers based on non-public data are difficult if not impossible for other scholars to interrogate post-publication.

2.4 Software Dependencies

Online software repositories — like CRAN or SSC — provide authors with easy access to the latest versions of powerful add-ons to standard programs like R and Stata. Yet the strength of these repositories — their ability to ensure authors are always working with the latest version of add-ons — is also a liability for replication.

Because online repositories always provide the most recent version of add-ons to users, the software provided in response to a given query actually changes over time. Experience has shown this can cause problems when authors use calls to these repositories to install add-ons (through commands like install_packages(“PACKAGE”) in R or ssc install PACKAGE in Stata. As scholars may attempt to replicate papers months or years after a paper has been published, changes in the software provided in response to these queries may lead to replication failures. Indeed, the QJPS has experienced replication failures due to changes in the software hosted on the CRAN server that occurred between when a paper was submitted to the QJPS and when it was reviewed.

With that in mind, the QJPS now requires authors to include copies of all software (including both base software and add-on functions and libraries) used in the replication in their replication package, as well as code that installs these packages on a replicator’s computer. The only exceptions are extremely commonly tools, like R, Stata, Matlab, Java, Python, or ArcMap (although Java- and Python-based applications must be included). [3]

2.5 Randomizations and Simulations

A large number of modern algorithms employ randomness in generating their results (e.g., the bootstrap). In these cases, replication requires both (a) ensuring that the exact results in the paper can be re-created, and (b) ensuring that the results in the paper are typical rather than cherry-picked outliers. To facilitate this type of analysis, authors should:

Set a random number generator seed in their code so it consistently generates the exact results in the paper;
Provide a note in the README.txt file indicating the location of all such commands, so replicators can remove them and test the representativeness of result.

In spite of these precautions, painstaking experience has shown that setting a seed is not always sufficient to ensure exact replication. For example, some libraries generate slightly different results on different operating systems (e.g. Windows versus OSX) and on different hardware architectures (e.g. 32-bit Windows 7 versus 64-bit Windows 7). To protect authors from such surprises, we encourage authors to test their code on multiple platforms, and document any resulting exceptions or complications in their README.txt file.

2.6 ArcGIS

Although we encourage authors to write replication code for their ArcGIS-based analyses using the ArcPy scripting utility, we recognize that most authors have yet to adopt this tool. For the time being, the QJPS accepts detailed, step-by-step instructions for replicating results via the ArcGIS Graphical User Interface (GUI). However, as with the inclusion and installation of add-on functions, the QJPS has made available a tutorial on using ArcPy available to authors which we hope will accelerate the transition towards use of this tool. [4]

(3) Advice to Authors

In addition to the preceding requirements, the QJPS also provides authors with some simple guidelines to help prevent common errors. These suggestions are not mandatory, but they are highly recommended.

Test files on a different computer, preferably with a different operating system: Once replication code has been prepared, the QJPS suggests authors email it to a different computer, unzip it, and run it. Code often contains small dependencies—things like unnoticed software requirements or specific file locations—that go unnoticed until replication. Running code on a different computer often exposes these issues in a way that running the code on one’s own does not.
Check every code-generated result against your final manuscript PDF: The vast majority of replication problems emerge because authors either modified their code but failed to update their manuscript, or made an error while transcribing their results into their paper. With that in mind, authors are strongly encouraged to print out a copy of their manuscript and check each result before submitting your final version of the manuscript and replication package.

(4) Conclusion

As the nature of academic research changes, becoming ever more computationally intense, so too must the peer review process. This paper provides an overview of many of the lessons learned by the QJPS‘s attempt to address this need. Most importantly, however, it documents not only the importance of requiring the transparent publication of replication materials but also the strong need for in-house review of these materials prior to publication.

Notes

[1] September 2012 is when the author took over responsibility for all in-house interrogations of replication packages at the QJPS.

[2] This does not include code which failed to execute, which might also be thought of as failing to replicate results from the paper.

[3] To aid researchers in meeting this requirement, detailed instructions on how to include CRAN or SSC packages in replication packages are provided through the QJPS.

[4] ArcPy is a Python-based tool for scripting in ArcGIS.

Editor’s Note [6/22/2015]: the number of packages with discrepancies was corrected from 13 (54%) to 14 (58%) at the author’s request.