By R. Michael Alvarez
Editor’s note: this post is contributed by R. Michael Alvarez, Professor of Political Science at the California Institute of Technology and Co-Editor of Political Analysis.
Recently, the American Political Science Association (APSA) launched an initiative to improve research ethics. One important outcome of this initiative is a joint statement that a number of our discipline’s major journals have signed: the Data Access and Research Transparency statement. These journals include Political Analysis, the American Political Science Review, the American Journal of Political Science, and at present a handful of other journals.
The joint statement outlines four important goals for these journals:
- Require that authors provide their datasets at the time of publication in a trusted digital repository.
- Require that authors provide analytic procedures so that the results in their publication can be replicated.
- Develop and implement a data citation policy.
- Make sure that journal guidelines and other materials delineate these requirements.
We are happy to report that Political Analysis has complied with these requirements and in the case of our policies and procedures on research replication, our approach has provided an important example for how replication can be implemented by a major journal. Our compliance with these requirements is visible in the new instructions for authors and reviewers that we recently issued.
Political Analysis has long had a policy that all papers published in our journal need to provide replication data. Recently we have taken steps to strengthen our replication policy and to position our journal as a leader on this issue. The first step in this process was to require that before we send an accepted manuscript to production that the authors provide the materials necessary to replicate the results reported in the manuscript. The second step was for us to develop a relatively simple mechanism for the archiving of this replication material the development of the Political Analysis Dataverse. We now have over 200 replication studies in our Dataverse, and these materials have been downloaded over 15,000 times.
Exactly how does this work? Typically, a manuscript successfully exits the review process, and the editors conditionally accept the manuscript for publication. One of the conditions, of course, is that the authors upload their replication materials to the journal’s Dataverse, and that they insert a citation to those materials in the final version of their manuscript. Once the materials are in the journal’s Dataverse, and the final version of the manuscript has been returned to our editorial office, both receive final review. As far as the replication materials go, that review usually involves:
- An examination of the documentation provided with the replication materials.
- Basic review of the provided code and other analytic materials.
- A basic audit of the data provided with the replication materials.
The good news is that in most cases, replication materials pass this review quickly authors know our replication requirement and most seem to have worked replication into their research workflow.
Despite what many may think (especially given the concerns that are frequently expressed by other journal editors when they hear about our replication policy), authors do not complain about our replication policy. We’ve not had a single instance where an author has refused or balked about complying with our policy. Instead, the vast majority of our authors will upload their replication materials quickly after we request them, which indicates that they have them ready to go and that they build the expectation of replication into their research workflow.
The problems that we encounter generally revolve around adequate documentation for the replication materials, clarity and usability of code, and issues with the replication data itself. These are all issues that we are working to develop better guidelines and policies regarding, but here are some initial thoughts.
First, on documentation. Authors who are developing replication materials should strive to make their materials as usable as possible for other researchers. As many authors already know, by providing well-documented replication materials they are increasing the likelihood that another scholar will download their materials and use them in their own research which will likely generate a citation for the replication materials and the original article they come from. Or a colleague at another university will use well-documented
replication materials in their methods class, which will get the materials and the original article in front of many students. Perhaps a graduate student will download the materials and use them as the foundation for their dissertation work, again generating citations for the materials and the original article. The message is clear; well-documented replication materials are more likely to be used, and the more they are used the more attention the original research will receive.
Second, clarity and usability of code. For quantitative research in social science, code (be it R, Stata, SPSS, Python, Perl or something else) is the engine that drives the analysis. Writing code that is easy to read and use is critical for the research process. Writing good code is something that we need to focus more attention on in our research methodology curriculum, and as a profession we need more guidelines regarding good coding practices. This is an issue that we are Political Analysis will be working on in the near future, trying to develop guidelines and standards for good coding practices so that replication code is more usable.
Finally data. There are two primary problems that we see with replication data. The first is that authors provide data without sufficiently clearing the data of “personally identifying information” (PII). Rarely is PII necessary in replication data; again, the purpose of the replication material is to reproduce the results reported in the manuscript. Clearly there may be subsequent uses of the replication data, in which another scholar might wish to link the replication materials to other datasets. In those cases we urge the producer of the replication materials to provide some indication in their documentation about how they can be contacted to assist in that linkage process.
The second problem we see regards the ability of authors to provide replication data that can be made freely available. There are occasions where important research uses proprietary data; in those situations we encourage authors to let the editors know that they are using proprietary or restricted data upon submission so that we have time to figure out how to recommend that the author comply with our replication requirement. Usually the solution entails having the author provide clear details about how one would reproduce the results in the paper were one to have access to the proprietary or restricted data. In many cases, those who wish to replicate a published paper may be able to obtain the restricted data from the original source, and in such a situation we want them to be able to know exactly each step that goes from raw data to final analysis.
Recently we updated our replication policies, and also developed other policies that help to increase the transparency and accessibility of the research that is published in Political Analysis. However, policies and best practices in these areas are currently evolving and very dynamic. We will likely be updating the journal’s policies frequently in the coming years, as Political Analysis is at the forefront of advancing journal policies in these areas. We are quite proud of all that we have accomplished regarding our replication and research policies at Political Analysis, and happy that other journals look to us for guidance and advice.