By Thomas J. Leeper
Peer review — by which I mean the decision to publish research manuscripts based upon the anonymous critique of approximately three individuals — is a remarkably recent innovation. As a June, 2015 article in Times Higher Education makes clear, the content of scholarly journals has been historically decided almost entirely by the non-anonymous read of one known individual, namely the journal’s editor (Fyfe 2015). Indeed, even in the earliest “peer-reviewed” journal, Philosophical Transactions, review was conducted by something much closer to an editorial board than a broad, blind panel of scientific peers (Spier 2002, 357). Biagioli (2002) describes how early academic societies in 17th century England and France implemented review-like procedures less out of concern for scientific integrity than out of concern for their public credibility. Publishing questionable or objectionable content under the banner of their society was risky; publishing only papers from known members of the society or from authors who had been vetted by such members became policy early on (Bornmann 2013).
Despite these early peer reviewing institutions, major scientific journals retained complete discretionary editorial control well into the 20th century. Open science advocate Michael Nielsen highlights that there is only documentary evidence that one of Einstein’s papers was ever subject to modern-style peer review (Nielsen 2009). The journal Nature adopted a formal system of peer review only in 1967. Public Opinion Quarterly, as another example, initiated a peer review process only out of necessity in response to “impressive numbers of articles from contributors who were outside the old network and whose capabilities were not known to the editor” (Davison 1987, S8) and even then largely involved members of the editorial board.[1]
While scholars are quick to differentiate a study “not yet peer reviewed” from one that has passed this invisible threshold, peer review is but one route to scientific credibility and not a particularly reliable method of enhancing scientific quality given its present institutionalization. In this article, I argue that peer review must be seen as part of a set of scientific practices largely intended to enhance the credibility of truth claims. As disciplines from political science to psychology to biomedicine to physics debate the reproducibility, transparency, replicability, and truth of their research literatures,[2] we must find a proper place for peer review among a host of individual and disciplinary activities. Peer review should not be abandoned, but it must be reformed in order to retain value for contemporary science, particularly if it is also meant to improve the quality of science rather than simply the perception that scientific claims are credible.
The multiple routes to credibility
As Skip Lupia and Colin Elman have argued, anyone in society can make knowledge claims (Lupia and Elman 2014). The sciences have no inherent privilege nor uncontested authority in this regard. Science, however, distinguishes itself through multiple forms of claimed credibility. Peer review is one such path to credibility. When mainstream scientists make claims, they do so with the provisional non-rejection of (hopefully) similarly or more able peers. Regardless of what peer review actually does to the content or quality of science,[3] it collectivizes an otherwise individual activity, making claims appear more credible because once peer reviewed they carry the weight of the scientific societies orchestrating the review process.
Peer review as currently performed is essentially an outward-facing activity. It enhances the standing of scientific claims relative to claims made by others above and beyond the credibility lent simply by one being a scientist as opposed to someone else (Hovland and Weiss 1951; Pornpitakpan 2004). As such, peer review is not an essential aspect of science, but rather a valuable aspect of science communication. Were we principally concerned with the impact of peer review on scientific quality (as opposed to the appearance of scientific quality), we would (1) rightly acknowledge the informal peer review that almost all research is subject to (through presentation, informal conversation, and public discussion), (2) conduct peer review earlier and perhaps multiple times during the execution of the scientific method (rather than simply at is conclusion), and (3) subject all forms of public scientific claim-making (such as conference presentations, book publication, blog and social media posts, etc., which are only reviewed in some cases) to more rigorous, centralized peer review. These things we do not do because — as has always been the case — peer review is primarily about the protection of the credibility of the groups publishing research.[4] If peer review were the only way of enhancing the credibility of knowledge claims, this slightly impotent system of post-analysis/pre-publication review might make sense, particularly if it had demonstrated value apart from its contribution to scientific credibility.
Yet, the capacity of peer review processes to enhance rather than diminish scientific quality is questionable, given how peer review introduces well-substantiated publication biases (Sterling 1959; Sterling, Rosenbaum, and Weinkam 1995; Gelman and Weakliem 2009), is highly unreliable,[5] and offers a meager barrier to scientific fraud (as political scientists know all too well).
Alternative yet complementary paths to credibility that are unique to science include the three other R’s: registration, reproducibility, and replication. Registration is the documentation of scientific projects prior to being conducted (Humphreys, Sanchez de la Sierra, and van der Windt 2013). Registration aims to avoid publication biases, in particular the “file drawer” problem (Rosenthal 1979). Registering and ideally pre-accepting such kernels of research provides a uniquely powerful way of combating publication biases. Registration indicates that research was worth doing regardless of what it finds and offers some partial guarantee that the results were not selected for their size or substance. The real advantage of registration will come when it is combined with processes of pre-implementation peer review (which I argue for below) because publications will be based on the degree to which they are interesting, novel, and valid without regard to the particular findings that result from that research. Pilot tests of such pre-registration reviewing at Comparative Political Studies, Cortex, and The Journal of Experimental Political Science should prove quite interesting.
Reproducibility relates to the transparency of conducted research and the extent to which data sources and analysis are publicly documented, accessible, and reusable (Stodden, Guo, and Ma 2013). Reproducible research translates a well-defined set of inputs (e.g., raw data sources, code, computational environment) into the set of outputs used to make knowledge claims: statistics, graphics, tables, presentations, and articles. It especially avoids consequential errors that result in (un)intended misreporting of results (see, provocatively, Nuijten et al. 2015). Reproducibility invites an audience to examine results for themselves, offering self-imposed public accountability as a route to credibility.
Replication involves the repeated collection and analysis of data along a common theme. Among the three R’s, replication is clearly political science’s most wanting characteristic. Replication — due to its oft-claimed lack of novelty — is seen as secondary science, unworthy of publication in the discipline’s top scientific outlets. Yet replication is what ensures that findings are not simply the result of sampling errors, extremely narrow scope conditions, poorly implemented research protocols, or some other limiting factor. Without replication, all we have is Daryl Bem’s claims of precognition. Replication builds scientific literatures that offer far more than a collection of ad-hoc, single-study claims.[6] Replication further invites systematic review that in turn documents heterogeneity in effect sizes and the sensitivity of results to samples, settings, treatments, and outcomes (Shadish, Cook, and Campbell 2001).
These three R’s of modern, open science are complementary routes to scientific credibility alongside scientific review. The challenge is that none of them guarantees scientific quality. Registered research might be poorly conceived, reproducible research might be laden with errors, and replicated research might be pointless or fundamentally flawed. Review, however, is uniquely empowered to improve the quality of science because of its capacity for both binding and conversational input from others. Yet privileging peer review over these other forms of credibility enhancement prioritizes an anachronistic, intransparent, and highly contrived social interaction of ambiguous effect over alternatives with face-valid positive value. The three R’s are credibility enhancing because they offer various forms of lasting public accountability. Peer review, by contrast, does not. How then can review be improved in order to more transparently enhance quality and thus lend further credibility to scientific claims? The answer comes in both institutional reforms and changes in reviewer behavior.
Toward better peer review
First, several institutional changes are clearly in order:
- Greater transparency. The peer review process produces an enormous amount of metadata, which should all be public (including reviews themselves). Without such information, it is difficult to evaluate the effectiveness of the institution and without transparency, there is little opportunity for accountability in the process. Freire (2015) makes a compelling argument for how this might work in political science. At a minimum, releasing data on author characteristics, reviewer characteristics and decisions, and manuscript topics (and perhaps coding of research contexts, methods, and findings) would enable exploratory research on correlates of publication decisions. Releasing reviews themselves after an embargo period, would hold reviewers (anonymously) accountable for the content and quality of reviews and enable further research into, perhaps subtle, biases in the reviewing process. Accountability for the content of reviews and for peer review decisions should — following the effects of accountability generally (Lerner and Tetlock 1999) — improve decision quality.
- Earlier peer review. Post-analysis peer review is an immensely ineffective method of enhancing scientific quality. Reviewers are essentially fettered, able only to comment on research that has little to no hope of being conducted anew. This invites outcome-focused review and superficial, editorial commentary. Peer review should instead come earlier in the scientific process, ideally prior to analysis and prior to data collection, when it would still be possible to change the theories, hypotheses, data collection, and planned analysis. (And when it might filter out research that is so deficient that resources should not be expended on it.) If peer review is meant to affect scientific quality, then it must occur when it has the capacity to actually affect science rather than manuscripts. Such review would replace post-analysis publication and would need to be binding on journals, so that they cannot refuse to publish “uninteresting” results of otherwise “interesting” studies. Pre-reviewed research should also have higher credibility because it has a plausible claim of objectivity: reporting is not based on novelty or selective reporting of ideologically favorable results.[7] If peer review continues to occur after data are collected and analyzed, a lesser form of “outcome-blind” reviewing could at least constrain reviewer-induced publication biases.
- Bifurcated peer review. Journals, owned by for-profit publishers and academic societies, care about rankings. It is uncontroversial that this invites a focus on research outcomes sometimes at the expense of quality. It also means that manuscripts often cycle through numerous review processes before being published, haphazardly placing the manuscript’s fate in the hands of a sequence of three individuals. Removing peer review from the control of journals would streamline this process, subjecting a manuscript to only one peer review process, the conclusion of which would be a competitive search for the journal of best fit.[8] Because review is not connected to any specific outlet, reviews can focus on improving quality without considering whether research is “enough” for a given journal. A counterargument would be that bifurcation means centralization, and with it an even greater dependence on unreliable reviews. To the contrary, a centralized process would be better able to equitably distribute reviewer workloads, increase the number of reviews per manuscript,[9] potentially increase reviewers’ engagement with and commitment to improving a given manuscript, and enable the review process to involve a dialogue between authors and reviewers rather than a one-off, one-way communication. Like a criminal trial, one could also imagine various experimental innovations such as allowing pre-review stages of evidence discovery and reviewer selection. But the overall goal of a bifurcated process would be to publish more research, publish concerns about that research alongside the manuscripts, and allow journals to focus on recommending already reviewed research.
- Post-publication peer review. Publicly commenting on already published manuscripts through venues like F1000Research, The Winnower, or RIO helps to put concerns, questions, uncertainties, and calls for replication and future research into the permanent record of science. Research is constantly “peer reviewed” through the normal process of scientific discussion and occasional replication, but discussions are rarely recorded as expressed concerns about manuscripts, so post-publication peer review ensures that publication is not the final say on a piece of research.
While these reforms are attractive ways to improve the peer review process, widespread reform is probably long off. How can reviewers behave today in order to enhance the credibility of scientific research while also (ideally) improving the actual quality of scientific research? Several ideas come to mind:
- Avoid introducing publication bias. As a scientist, the size, direction, and significance of a finding should not affect whether I see that research as well-conducted or meriting wider dissemination.[10] This requires outcome-blind reviewing, even if self-imposed. It also means that novelty is not a useful criterion when evaluating a piece of research.
- Focus on the science, not the manuscript. An author has submitted a piece of research that they feel moderately confident about. It is not the task of reviewers to rewrite the manuscript; that is the work of the authors and editors. Reviewers need to focus on science, not superficial issues in the manuscript. A reviewer does not have to be happy with a manuscript, they simply have to not be dissatisfied with the apparent integrity of the underlying science.
- Consider reviewing a conversation. The purpose is to enhance scientific quality (to the extent possible after research has already been conducted). This means that lack of clarity should be addressed through questions not rejections nor demands for alternative theories. Reviews should not be shouted through the keyboard but rather seen as the initiation of a dialogue intended to equally clarify both the manuscript and the reviewer’s understanding of the research. In short, be nice.
- Focus on the three other R’s. Reviewers should ensure that authors have engaged in transparent, reproducible reporting and they should reward authors engaged in registration, reproducibility, and replication. If necessary, they should ask to see the data or features thereof to address concerns, even those materials cannot be made a public element of the research. They should never demand analytic fishing expeditions, quests for significance and novelty, or post-hoc explanations for features of data.
These are simple principles, but it is surprising how rarely they are followed. To maximize its value, review must be much more focused on scientific quality than novelty-seeking and editorial superficiality. The value of peer review is that it can actually improve the quality of science, which in turn makes those claims more publicly credible. But it can also damage scientific quality. This means that the activity of peer review within current institutional structures should take a different tone and focus, but the institution itself should also be substantially reformed. Further, political science should be in the vanguard of adopting practices of registration, reproducibility, and replication to broadly enhance the credibility of our collective scientific contribution. Better peer review will ensure those scientific claims not only appear credible but actually are credible.
References
Biagioli, Mario. 2002. “From Book Censorship to Academic Peer Review.” Emergences: Journal for the Study of Media & Composite Cultures 12(1): 11–45.
Bornmann, Lutz. 2013. “Evaluations by Peer Review in Science.” Springer Science Reviews 1(1-2): 1–4.
Bornmann, Lutz, Rüdiger Mutz, and Hans-Dieter Daniel. 2010. “A Reliability-Generalization Study of Journal Peer Reviews: A Multilevel Meta-Analysis of Inter-Rater Reliability and Its Determinants.” PLoS ONE 5(12): e14331.
Campanario, Juan Miguel. 1998a. “Peer Review for Journals as It Stands Today — Part 1.” Science Communication 19(3): 181–211.
Campanario, Juan Miguel. 1998b. “Peer Review for Journals as It Stands Today — Part 2.” Science Communication 19(4): 277–306.
Cicchetti, Domenic V. 1991. “The Reliability of Peer Review for Manuscript and Grant Submissions: A Cross-Disciplinary Investigation.” Behavioral and Brain Sciences 14: 119–186.
Davison, W. Phillips. 1987. “A Story of the POQ’s Fifty-Year Odyssey.” Public Opinion Quarterly 51: S4–S11.
Freire, Danilo. 2015. “Peering at Open Peer Review.”. https://thepoliticalmethodologist.com/2015/12/08/peering-at-open-peer-review/
Fyfe, Aileen. 2015. “Peer Review: Not as Old As You Might Think.”. https://www.timeshighereducation.com/features/peer-review-not-old-you-might-think
Gelman, Andrew, and David Weakliem. 2009. “Of Beauty, Sex and Power.” American Scientist 97(4): 310.
Hovland, Carl I., and Walter Weiss. 1951. “The Influence of Source Credibility on Communication Effectiveness.” Public Opinion Quarterly 15(4): 635.
Humphreys, Macartan, Raul Sanchez de la Sierra, and Peter van der Windt. 2013. “Fishing, Commitment, and Communication: A Proposal for Comprehensive Nonbinding Research Registration.” Political Analysis 21(1): 1–20.
Jefferson, Tom, Melanie Rudin, Suzanne Brodney Folse, and Frank Davidoff. 2007. “Editorial Peer Review for Improving the Quality of Reports of Biomedical Studies (Review).” Cochrane Database of Systematic Reviews 2(MR000016).
Lerner, Jennifer S., and Philip E. Tetlock. 1999. “Accounting for the Effects of Accountability.” Psychological Bulletin 125(2): 255–75.
Lupia, Arthur, and Colin Elman. 2014. “Openness in Political Science: Data Access and Research Transparency.” PS: Political Science & Politics 47(1): 19–42.
Nielsen, Michael. 2009. “Three Myths About Scientific Peer Review.”. http://michaelnielsen.org/blog/three-myths-about-scientific-peer-review/
Nuijten, Michele B., Chris H. J. Hartgerink, Marcel A. L. M. van Assen, Sacha Epskamp, and Jelte M. Wicherts. 2015. “The Prevalence of Statistical Reporting Errors in Psychology (1985-2013).” Behavior Research Methods , In press.
Peters, Douglas P., and Stephen J. Ceci. 1982. “Peer-Review Practices of Psychological Journals: The Fate of Published Articles, Submitted Again.” The Behavioral and Brain Sciences 5: 187–255.
Pornpitakpan, Chanthika. 2004. “The Persuasiveness of Source Credibility: A Critical Review of Five Decades’ Evidence.” Journal of Applied Social Psychology 34(2): 243–281.
Price, Eric. 2014. “The NIPS Experiment.”. http://blog.mrtz.org/2014/12/15/the-nips-experiment.html
Rosenthal, Robert. 1979. “The ‘File and Drawer Problem’ and Tolerance for Null Results.” Psychological 86(3): 638–641.
Schimmack, Ulrich. 2012. “The Ironic Effect of Significant Results on the Credibility of Multiple-Study Articles.” Psychological methods 17(4): 551–66.
Shadish, William R., Thomas D. Cook, and Donald T. Campbell. 2001. Experimental and Quasi-Experimental Designs for Generalized Causal Inference. Boston, MA: Houghton-Mifflin.
Smith, Richard. 2006. “Peer Review: A Flawed Process at the Heart of Science and Journals.” Journal of the Royal Society of Medicine 99: 178–182.
Smith, Richard. 2015. “The Peer Review Drugs Don’t Work.”. https://www.timeshighereducation.com/content/the-peer-review-drugs-dont-work
Spier, Ray. 2002. “The History of the Peer-Review Process.” Trends in Biotechnology 20(8): 357–358.
Sterling, Theodore D. 1959. “Publication Decisions and Their Possible Effects on Inferences Drawn from Tests of Significance–Or Vice Versa.” Journal of the American Statistical Association 54(285): 30–34.
Sterling, Theodore D., W. L. Rosenbaum, and J.J. Weinkam. 1995. “Publication Decisions Revisited: The Effect of the Outcome of Statistical Tests on the Decision to Publish and Vice Versa.” The American Statistician 49(1): 108–112.
Stodden, Victoria, Peixuan Guo, and Zhaokun Ma. 2013. “Toward Reproducible Computational Research: An Empirical Analysis of Data and Code Policy Adoption by Journals.” PLoS ONE 8(6): e67111.
The Journal of the American Medical Association. 1990. Vol. 263 American Medical Association.
Notes
- Campanario (1998a) and Campanario (1998b) provides an impressively comprehensive overview of the history and present state of peer review as an institution.
- See both now 25-year-old debate in The Journal of the American Medical Association reporting on the First International Congress in Peer Review in Biomedical Publication (The Journal of the American Medical Association 1990) and a recent debate in Nature for an excellent set of perspectives on peer review.
- Though only one meta-analysis, a 2007 Cochrane Collaboration systematic review of studies of peer review found no evidence of impact of peer reivew on quality of biomedical publications (Jefferson et al. 2007). More anecdotally, Richard Smith, long-time editor of the The British Medical Journal has repeatedly criticized peer review as irrelevant (Smith 2006) even to the point of arguing for its abolition (Smith 2015).
- Justin Esarey rightly points out that “peer review” is used here in the sense of a formal institution. One could argue that there is — and has always been — a peer reviewing market at work that comments on, critiques, and responds to scientific contributions. This is most visible in social media discussions of newly released research, but also in the discussant comments offered at conferences, and in the informal conversations had between scholars. Returning to the Einstein example from earlier, the idea of general relativity was never peer reviewed prior to or as part of publication but has certainly been subjected to review by generations of physicists.
- Cicchetti (1991) provides a useful meta-analysis of reliability of grant reviews. Peters and Ceci (1982), in a classic study, found highly cited papers once resubmitted were frequently rejected by the same journals that originally published them with little recognition of the plagiarism. For a more recent study of consistency across reviews, see this blog post about the 2014 NIPS conference (Price 2014). And, again, see Campanario (1998a,b) for a thorough overview of the relevant literature.
- It is important to caution, however, against demanding multi-study papers. Conditioning publication of a given article on having replications introduces a needless file drawer problem with limited benefit to scientific literature. Schimmack (2012), for example, shows that multi-study papers appear more credible than single-study papers even when the latter approach has higher statistical power.
- Some opportunities for pre-analysis peer review already exist, including: Time-Sharing Experiments for the Social Sciences, funding agency review processes, and some journal efforts (such as in a recent Comparative Political Studies special issue or the new “registration” track of the Journal of Experimental Political Science).
- Or, ideally, the entire abandonment of journals as an antiquated, space-restricted venue for research dissemination.
- Overall reviewer burden would be constant or even decrease because the same manuscript would not be required to be passed to a completely new set of reviewers at each journal. Given that current reviewing process involve a small number of reviewers per manuscript, the results of review processes tend to have low reliability (Bornmann, Mutz, and Daniel 2010); increasing the number of reviewers would tend to counteract that.
- Novel, unexpected, or large results may be exciting and publishing them may enhance the impact factor or standing of a journal. But if peer review is to focus on the core task of enhancing scientific quality, then those concerns are irrelevant. If peer reviewer were instead designed to identify the most exciting, most “publishable” research, then authors should be monetarily incentivized to produce such work and peer reviewers should be paid for their time in identifying such studies. The public credibility of claims published in such a journal would obviously be diminished, however, highlighting the need for peer review as an exercise in quality alone as a route to credibility. One might argue that offering such judgments are helpful because editors work with a finite number of printable pages. Such arguments are increasingly dated, as journals like PLoS demonstrate that there is no upper bound to scientific output once the constraints of print publication are removed.